HomePython Page 3 - Python Email Libraries, part 2: IMAP
Searching for Messages - Python
The first article in this series discussed how to access a POP3 server with a Python script. While that protocol is useful for learning the basics of how email works, IMAP is the protocol most used today. This article covers this more complicated protocol.
Once you have connected and entered the correct mailbox, you will want to search for specific messages. You can do this by downloading the pertinent information for all messages in the mailbox, and then processing all of that data locally. However, that is typically not necessary. IMAP defines a very useful search function that allows the server to do this for you. This is important for several obvious reasons. First of all, it greatly reduces the amount of network traffic sent, thus speeding up your program. It also reduces the amount of logic you have to code directly into your system.
There are many available attributes for you to search on. For instance, you can search for only new messages, messages of a certain size, messages that are from a particular person or email address, or messages that contain a specific string in the body. In general, all of the search criteria are strung together with ANDs, but it is possible to use the prefix operator OR to make an “or” condition. Some examples look like:
r, data = server.search(None, “(FROM \”fred\”)”)
r, data = server.search(None, “(SMALLER 20000)”)
r, data = server.search(None, “(NEW)”)
r, data = server.search(None, “(OR NEW SMALLER 20000)”)
These examples should be relatively obvious. The first searches for all messages that have “fred” in the “FROM” header. The second searches for all messages smaller than 20kb (imaplib uses raw bit octets, so multiplying by 1000 is necessary), and the third searches for all messages that have not been viewed yet. Finally, the last searches for all messages that are either new or smaller than 20kb.
Notice that the “OR” operator is a prefix, and must be put before the two search keys that are being operated on; this is also the case for the “NOT” operator. Another thing to note in this example is the strategy of putting a tuple on the left side of the equals sign, thus allowing you to break the return value immediately into the response part and the data part. This is somewhat different from the response processing shown above, but allows you to easily grab both the response and data.
The data portion of the returned information contains a list of the message sequence numbers that satisfy the search criteria. In all of the above statements, the first argument is something called a “charset” argument. It defines the sort of encoding that the messages should have. Usually, this is used for defining different character encoding standards to deal with different languages' alphabets. Here, we use the “None” arguments for simplicity, assuming that all messages use the same English encoding.