Python Email Libraries, part 1: POP3

Some very useful business software connects with and interacts with email in various ways. If you are building or working with such software, you might want to know how Python accomplishes these tasks. This article series discusses how to use the email libraries built into Python. In this first part, POP3 is covered.

Introduction

Many software systems require the ability to connect with and provide services that interact with email in different ways. Perhaps you’re writing software for a bulletin board that must notify users when new responses are posted in certain topics, or a piece of scheduling software that must notify users when new engagements are assigned to them. There are, of course, many further applications for integration between business or Web software with email. 

Integrating with email allows for advantages from both business and portability perspectives. From a business perspective, integrating your software with email allows users to use their email accounts as a central place in which to get notifications and information about a wide range of business objectives. From a portability viewpoint, email adds a layer of abstraction, making it possible for a user to receive alerts from your software in any way they choose: via a classic email client, a Web client from a public computer or home, or even a cell phone or Blackberry capable of accessing standard email protocols.
 
This article series will discuss how to use the email libraries built in to Python. I will describe how to access email on both POP and IMAP servers, how to parse this mail into easily usable data structures, and how to create email items and then send them through an SMTP server.

{mospagebreak title=The POP Protocol}

Overall, POP is a relatively simple protocol; for instance, there is no definition for creating multiple message folders as there is in IMAP, as POP simply allows the client to connect and download from a list of messages available on the server. This means that the tasks available while connected to a POP server are relatively few. You can list the messages on the server, download either all or part of a message, and delete messages off of the server when they no longer need to stay there. 

Originally, messages were not intended to remain on the server indefinitely, as many people are familiar with in most contemporary email systems. POP clients were designed to connect to the server, list the messages, download those the client wanted to read, and delete the downloaded messages. Nowadays, with most email accounts accessible over the Web as well as with a normal email client, this is not the preferred manner of use. Rather, most clients will download a message and generally leave the message on the server to allow access to the messages from more than a single PC. 

Opening a POP Connection

The first step necessary for downloading messages from the server is to open a connection to that server and authenticate the username and password against the server. The objects to connect with a POP server are contained within the poplib library that comes with the standard download of Python. This library contains the POP3 object which actually does the work of connecting to and communicating with the remote server. This task is really pretty simple:

 from poplib import *
 …
 server = POP3(“123.213.112.23”)
 print server.getwelcome()
 print server.user(“user”)
 print server.pass_(“password”)

The first line in this block imports the necessary objects from the library. Next, the code creates a new POP3 object to connect with the server at 123.213.112.23. As you can tell from the later lines, all this does is open the basic network connection to the server; you must still log in with a username and password. 

After creating the server object, we get the welcome string from the server. Some server administrators use this to disseminate information, so it can be a useful thing to know how to get. The last two lines of this code log in with a specific username and password. The second-to-last line sends the username and the last line sends the password.

This example uses the older, relatively insecure POP authentication.  Python provides support for two additional authentication schemes. Both APOP and RPOP are implemented in standard Python. When to use each of these authentication schemes depends on the specific server implementation to which the client is connecting. 

{mospagebreak title=Getting Message Info}

Once you’re connected to the server, the natural thing you’ll want to do is download a list of messages available on the server for download.  This is done using the list operation. This method returns a Python list data structure, where the first element is the server’s response, and the second element is another list of strings, each of which contains a message number and message size. An example of how you might do this follows:

 messagesInfo = server.list()[1]
 numMessages = len(messageInfo)

One important thing to note is that the list method can alternatively take an argument that specifies a subset of messages whose information to return. How to actually do this is described in the POP RFC.

Protocol Definition Concerns
 
One thing to note regarding the POP3 library concerns the list operation in the first line of the above code. As you can see, it pulls the second element off of the list to work with and ignores the first. This is because almost all methods in the POP3 library return a list, and the first element of that list is almost always the server’s response to the command, usually a string of something like “OK” or “BAD” telling you whether or not the server was able to process the request.

Throughout this article, I will generally assume that the commands we send to the server will work correctly in order to focus more on the actual workings of the POP library. However, when you work with this library in any sort of production level software, this first element of the response list is invaluable for error detection and recovery. You can check this response in the case that you get no data back from a request to find out if there really is no data, or if your request was in some way flawed. If you want to know the exact possibilities for this server response field, they are defined in RFC 1725. 

{mospagebreak title=Getting Messages from the Server}

After you have downloaded the list of messages from the server, you can now begin to download the messages themselves. An example of how you might do this follows:

 emails = []
 for msg in messagesInfo:
  msgNum = int(split(msg, “ “)[0]
  msgSize = int(split(msg, “ “)[1]
  if(msgSize < 20000):
   message = server.retr(msgNum)[1]
   message = join(message, “n”)
   emails.append(message)

Overall, this code loops through each element of message info that was downloaded in the previous block of code. For each of those bits of message info, it splits them into  the message number and the message size. Note the way the string library’s split method simplifies this operation. For each message whose size is less than 20 kilobytes (sizes are returned in raw bytes, hence the 20000 rather than 20), it downloads the message and strips off the server response. The actual data of a downloaded message is given to you as a list of strings, where each string contains one line of the message. 

We use the string library join method to easily put all of these strings together in one massive string with a newline character at the end of each line. We then put this string on a list containing all of the downloaded messages. This data structure could then be used with Python’s built in email parsing library to create an easily accessible data structure for accessing the header fields for each message. The actual process for doing this will be covered in a later article. 

Advanced Topics: Network Latency and Threading

Generally, that’s about it. RFC 1725 contains further information on exactly how to specify specific subsets of messages for info retrieval, which is a useful capability, although sometimes it is easier to simply grab all the message info and process it locally. An important thing to note about all of these various networking calls described here is that they are all “blocking” calls. This means that execution of the main program thread is suspended while it waits for a response from the server.

If you are using these networking calls within the context of a command line script, or even a Web script, this will probably not matter. Generally, there will be a good reason for having the program’s main execution thread wait for the network calls to complete before moving on to its next step. However, if you are using these calls in a program served by a GUI front end, this blocking can be very frustrating for a user. 

Because network communication time-outs are usually defined in the time frame of 30-60 seconds, using these calls in the main execution stream of a GUI program will lock the GUI until that call completes. If the call happens to fail or the server doesn’t respond, that means the GUI is locked for 30-60 seconds while it is waiting for the IP implementation on that computer to tell it whether or not the network communication succeeded. This behavior for a GUI is usually not considered acceptable. 

The solution to this problem lies in separate execution threads for the network operations. Overall, a solution to this problem would take a form something like this. The event handler for a GUI component that needs to cause a network operation should spin a separate execution thread off that actually performs the network call. A status bar update can be used to notify the user that the network operation is in process. This separate thread will then perform the network call and effect any change that needs to be made to the GUI. This prevents the GUI from becoming unresponsive while the network call is proceeding.

Google+ Comments

Google+ Comments