Email Address Verification with PHP

Many applications in the field of Web development need to validate email addresses. While this can be done in a variety of ways, one simple but effective way involves writing your own functions in PHP. Alejandro Gervasio explains this approach.

Within the huge and fascinating field of Web development, one of the most common tasks that many applications have to deal with is, undoubtedly, verifying whether a user email address is valid. Certainly, this should sound very familiar to most Web developers, whether they are setting up their first consciously-coded script or implementing full-blown applications required to handle more complex processes. Whatever the case, validating a visitor’s email address to see if it belongs to a real domain is always a good step to help you avoid, at least partially, several possible problems that arise when applications are receiving incoming bogus data. From cluttering up databases with invalid information, to sending newsletters or similar content to email addresses at nonexistent domains, headaches are surely going to come up from receiving fake email.

Several approaches can be taken to address the problem, depending on the level of complexity desired for the validation itself. If the application is going to make use of a basic level of validation, a quick-and-dirty way to handle the situation might be to implement a simple PHP function that performs pattern matching to a standardized email address format, as we have seen many times. However, when a deeper and more complex validation is required, we should take a look at well-trusted validation classes, such as Pear’s HTML_Quick Form class, or many other validation classes widely available out there.

The third option involves writing our own set of functions for in-deep email address checking, which can be considered an intermediate solution between the two above described. This approach is versatile and portable enough to be used whether we want to expand basic validating functions or add extra functionality to existing classes.

In this article, we’ll develop a reusable step-by-step solution to validate a user’s email address as accurately as possible, in an attempt to save some work the next time an application needs to check for email validity. The process will show the power of some interesting PHP built-in network functions, as well as demonstrate how to reduce noticeably the possibilities of dirtying our applications with user-supplied bogus email.

{mospagebreak title=Validating the proper format of an email address}

The first step to validating an email address is to check whether it is in the standard format. If you’re a seasoned programmer, feel free to skip over this description. However, for the sake of giving a complete explanation, we’re going to begin by defining the first checking function, which takes advantage of PHP’s built-in support for regular expressions:

function  checkEmail($email) {
 if (!preg_match(“/^( [a-zA-Z0-9] )+( [a-zA-Z0-9._-] )*@( [a-zA-Z0-9_-] )+( [a-zA-Z0-9._-] +)+$/” , $email)) {
  return false;
 }
 return true;
}

Nothing unexpected, right? The simplistic checkEmail() function validates the format of a user’s email address by encoding its standardized format in a regular expression. The preg_match() PHP built-in function looks for matches to the email pattern, given the string $email passed as a parameter. If matches are found, the function will return true. Otherwise, it will return false. It’s a very simple concept, really.

A bit of analysis clearly shows that many invalid email addresses passed as an argument will still match this regular expression. This will result in the function returning true, and considering the addresses as valid data. Though it may be impossible to catch all of the email addresses this way, performing validation routines on the format itself can improve our overall checking process.

Since this function on its own is insufficient for checking whether an email address is valid, we need to look for other ways of improving the validation process. The next step is to check whether an email address corresponds to a real domain by making sure there is a domain registration record for the domain that the user entered. How we achieve that is the subject of the next section.

{mospagebreak title=Validating email domains with checkdnsrr()}

In order to check whether a user’s email address actually corresponds to a real domain, we should search for the proper domain records in the DNS. By doing so, we’re making sure that the supplied email address belongs to an existing domain. To do this, we can use a couple of PHP lookup functions that come in handy for addressing these problems.

The checkdnsrr() function checks DNS records corresponding to a given Internet host name or IP address. It searches the DNS for records of a specific type corresponding to the given host, returning true if any records are found, or returning false if no records are found or if an error occurs.

It has the following format:

int checkdnsrr ( string host [, string type]) ;


The function accepts the following types of records: A, MX, NS, SOA, PTR, CNAME, or ANY, with MX (Mail Exchange) as the default type. So, if no type is provided, the function will search MX records for the given host. In our case, we need to look for MX records according to the host provided within the email address. Therefore, it‘s pretty easy to code a new function, which will take care of checking the existence of the corresponding MX entries for a given host. Let’s write the function to do that:

function checkEmail($email) {
 if(preg_match(“/^( [a-zA-Z0-9] )+( [a-zA-Z0-9._-] )*@( [a-zA-Z0-9_-] )+( [a-zA-Z0-9._-] +)+$/” , $email)){
  list($username,$domain)=split(‘@’,$email);
  if(!checkdnsrr($domain,’MX’)) {
   return false;
  }
  return true;
 }
 return false;
}

The above function accepts a string as an email parameter for checking whether it fits the proper format, and whether the domain is real. In order to obtain the domain part, we split the email address into the username and domain sections, respectively, using the PHP’s split() function, as listed below:

list($username,$domain)=split(‘@’,$email);

Now, the $domain variable stores the corresponding domain. Since we’re interested only in this, all we need to do is pass it in as a parameter for the PHP chekdnsrr() function to determine whether it’s a real domain. It will look for the Mail Exchange record in the DNS (remember that the default type is MX), and return true if a MX record is found, which shows that the address displays a valid email domain. If the function returns false, the email domain is not valid. Obviously, an email address that displays a real email domain doesn’t necessarily imply that the user name is valid. We have a big challenge ahead.

Once we have defined the new function, we could call it this way, assuming that we have an incoming email address from a POST form:

$email = trim($_POST['email']);  
if(!checkEmail($email)) { 
echo ‘Invalid email address!';
}
else {
 echo ‘Email address is valid';
}

Our function is very easy to implement and powerful enough to handle the problem of verifying that we’re dealing with an existing domain. However, as stated in the PHP manual, the checkdnsrr() function is not implemented on Windows platforms. It would be useful to have a version that works on Windows for those developers building applications to be executed on Windows servers.

There are some workarounds to deal with this. Most of these involve writing a custom version of the checkdnsrr() function. So let’s move on and write some code for this Windows-based function.

{mospagebreak title=Customizing checkdnsrr()  for Windows}

The customCheckDnsrr() function is a classical solution for implementing the desired functionality of checkdnsrr() on a Windows platform; it is extensively used across numerous scripts. The code for our new function is as follows:

function customCheckDnsrr($host,$recType=”) {
 if(!empty($host)) {
  if($recType==”) $recType=”MX”;
  exec(“nslookup -type=$recType $host”,$output);
  foreach($output as $line) {
   if(preg_match(“/^$host/”, $line)) {
    return true;
   }
  }
  return false;
 }
 return false;
}

Our version of the checkdnsrr() function works by making a system call, available in Windows systems, known as nslookup. It resembles the chekdnsrr() functionality, and is very useful for achieving the same results. We make use of the nslookup function by invoking the PHP exec() function, which is one of the several methods for executing a system command in PHP. The result of the command is stored as an array in the $output parameter.

When the nslookup function is executed, it searches the corresponding entry in the DNS for the given domain. If the result is successful, the output is similar to the following lines:

Server:  ns1.infoar.net
Address:   200.80.203.242
calop.com.ar   MX preference = 10, mail exchanger = mail.infoar.net

To determine whether a proper mail handler for that domain exists, the function iterates over each line of the output by searching for the line that begins with the provided host name. If the line is found, then the function will return true. Otherwise, it will return false.

Here’s the code for using our customCheckDnsrr() function:

function checkEmail($email) {
 if(preg_match(“/^( [a-zA-Z0-9] )+( [a-zA-Z0-9._-] )*@( [a-zA-Z0-9_-] )+( .[a-zA-Z0-9_-] +)+$/” , $email)){
  list($username,$domain)=split(‘@’,$email);
  if(!customCheckDnsrr($domain)){
   return false;
  }
  return true;
 }
 return false;
}

The snippet is almost identical to the one using checkdnsrr(). It has only been replaced with the customized function, previously defined.
 
So far, we have a working function to be properly implemented under Windows platforms. Similarly, we might replace checkdnsrr() with PHP’s getmxrr() within our checkEmail() function. Let’s see this new alternative to determine the validity of an email address’ domain in more detail.

{mospagebreak title=Using getmxrr() for validation}

It’s possible to use the getmxrr() PHP function to achieve email domain validation similar to that obtained with checkdnsrr(). This function gets MX records corresponding to a given Internet host name and has the following format:

int getmxrr ( string hostname, array mxhosts [, array weight]);

It searches the DNS for MX records corresponding to the given host name. It returns true if any records are found and returns false if no records are found or if an error occurs.

A list of the MX records found is placed into the array mxhosts. If the weight array is given, it will be filled with the weight information gathered.

Having presented this network function, it’s easily deducible that we could rewrite our checkEmail() function utilizing getmxrr() instead of checkdnsrr() to validate email domains. The revamped version is listed below:

function checkEmail($email) {
 if(preg_match(“/^( [a-zA-Z0-9] )+( [a-zA-Z0-9._-] )*@( [a-zA-Z0-9_-] )+( [a-zA-Z0-9._-] +)+$/” , $email)){
  list($username,$domain)=split(‘@’,$email);
  if(!getmxrr ($domain,$mxhosts)){
   return false;
  }
  return true;
 }
 return false;
}

And we call it in the following manner:

$email = trim($_POST['email']);  
if(!checkEmail($email)) { 
echo ‘Invalid email address!';
}
else {
 echo ‘Email address is valid';
}

This code looks very similar to the previous example using checkdnsrr(). The only subtle difference lies in that we have wrapped getmxrr() into the checkEmail() function. It’s a simple but powerful solution. As we can appreciate, the set of PHP network functions is an invaluable tool for validating email domains.

So far, we have defined individual functions to first, validate the proper format for an email address, and next, check whether the email domain is a real domain. From a strict point of view, this solution is still incomplete, because we really don’t know if the given user name is valid. How can this issue be addressed properly?

Actually, there is no direct way to do that. However, a fairly handy approach to help us see whether we are dealing with a valid user name might show us whether the email domain is currently in use. In that way, we can be somewhat more certain (but never completely) that someone is using that domain to send and receive email messages. This might brings us to the conclusion that the given user name is potentially valid. Let’s move to the next section to find out how we can determine this with a little extra work.

{mospagebreak title=Empowering validation with fsockopen()}

In order to find out whether an email domain is really in use, we’re going to take advantage of PHP’s fsockopen() function, which is used to open domain socket connections over the Internet. This serves our purposes handily, since we might try opening a socket connection to the mail server identified with the given domain. If the socket connection is successfully opened, then the supplied domain is currently in use.

The format for fsockopen() is the following:

int fsockopen ( string hostname, int port [, int errno [, string errstr [, float timeout]]]);

The function, when used for Internet domains, will open a TCP socket connection to the provided host name on the supplied port, and return a file pointer corresponding to that host. If the call fails, it will return false, and if the optional errno and errstr arguments are present, they will be set to indicate the actual system level error that occurred while performing the call. The optional timeout parameter can be used to set a timeout in seconds for the connect system call. 

Having taken a look at what this function does, it’s feasible to open a socket connection on port 25 (the default port for SMPT servers) to the given domain for a user’s email address in the following manner:

If(!fsockopen($domain,25,$errno,$errstr,30)) {
 return false;
}

Here we’re trying to open a socket connection to the provided domain on port 25, setting a timeout of 30 seconds for the connection. If the connection is successfully established, the function will return true, which means that the SMTP server is up and running, the email domain is real and, hopefully, there is a valid user for that domain. If the connection fails, the function will return false, either indicating that the domain is not being used, at least for the moment that we attempted to open the socket connection. As you can easily guess, there might be several reasons for a failed result. Even if the user was valid, the mail server might be down, our system might be having its own problems, or other difficulties inherent to any network process might exist.

Anyway, our rough attempt to enhance the validation process is still a valid effort worth considering. Here’s the checkEmail() function with the new enhancements:

function checkEmail($email) {
 // checks proper syntax
 if(preg_match(“/^( [a-zA-Z0-9] )+( [a-zA-Z0-9._-] )*@( [a-zA-Z0-9_-] )+( [a-zA-Z0-9._-] +)+$/” , $email)) {
  // gets domain name
  list($username,$domain)=split(‘@’,$email);
  // checks for if MX records in the DNS
  if(!checkdnsrr($domain, ‘MX’)) {
   return false;
  }
  // attempts a socket connection to mail server
  if(!fsockopen($domain,25,$errno,$errstr,30)) {
   return false;
  }
  return true;
 }
 return false;
}

And the code to call the function is listed as follows:

$email = trim($_POST['email']);  
if(!checkEmail($email)) { 
echo ‘Invalid email address!';
}
else {
 echo ‘Email address is valid';
}

We have taken a considerable step forward to improve the validation routines within our function.

To explain what we did step-by-step: once the email address is passed to the function, it is first validated to make sure it matches the regular expression. If the validation is successful, then the address is divided to obtain the email domain.

Then, the function checks whether the domain is real, looking for MX records in the DNS. Again, if the records are found, the next step is to open a socket connection for that domain on port 25, to determine whether the given domain is in use. If the connection is successful, we’re pretty sure that the email address corresponds to a real domain, which is currently in use, and the user name is potentially valid.
Any checking process that returns false, will evaluate the function as false too, terminating it, therefore indicating that the supplied email address is not valid.

Finally we’ve successfully reached our objective, with a few lines of PHP code. Not too bad, huh?

Summary

As with many other user data, email addresses are certainly pretty hard to validate. We’re not completely sure that what a visitor is giving us is valid input. However, as reviewed in this article, using several powerful PHP network functions combined is a great way to make the validation process a relatively painless task. Additionally, we’ve taken an instructive approach for some other concepts, such as working with lookup functions and sockets, even though we only scratched their surfaces. Thus, the next time you need to implement email verification in your PHP applications, don’t forget these invaluable tools. They’re really worthwhile.

[gp-comments width="770" linklove="off" ]

chat