PHP URL Validation Functions

Validating URLs is important to form handling and PHP data processing. Currently there are numerous solutions for validating URLs. This article will take a look at some of the most commonly used methods of validating URLs in PHP: the Regex method and the PHP built-in Filter_validate_URL.

Bear in mind that both methods have their own strengths and weaknesses. These will be examined thoroughly in this article. The objective of this article is to recommend the best possible way of validating URLs based on available choices and results.

The Validation Functions to be Evaluated

Searching Google for "validating URL PHP" or "PHP URL validation" (without quotes) yields the following six result,s which any developer can use and integrate into their own application:

#1. Source: http://php.net/manual/en/function.preg-match.php 

Validating Function:

<?php
function validateURL($url){
$regex = "((https?|ftp)://)?";  
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; 
$regex .= "([a-z0-9-.]*).([a-z]{2,3})"; 
$regex .= "(:[0-9]{2,5})?";  
$regex .= "(/([a-z0-9+$_-].?)+)*/?";  
$regex .= "(?[a-z+&$_.-][a-z0-9;:@&%=+/$_.-]*)?";  
$regex .= "(#[a-z_.-][a-z0-9+$_.-]*)?";  
if(preg_match("/^$regex$/", $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
?>

#2. Source: http://phpcentral.com/208-url-validation-in-php.html

Validating Function:

<?php
function validateURL($url){
$regex = "|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i";  
if(preg_match($regex, $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
?>

#3. Source:  http://www.blog.highub.com/regular-expression/php-regex-regular-expression/php-regex-validating-a-url/

Validating Function:

<?php
function validateURL($url){
$regex ="/^(([w]+:)?//)?(([dw]|%[a-fA-fd]{2,2})+(:([dw]|%[a-fA-fd]{2,2})+)?@)?([dw][-dw]{0,253}[dw].)+[w]{2,4}(:[d]+)?(/([-+_~.dw]|%[a-fA-fd]{2,2})*)*(?(&amp;?([-+_~.dw]|%[a-fA-fd]{2,2})=?)*)?(#([-+_~.dw]|%[a-fA-fd]{2,2})*)?$/";  
if(preg_match($regex, $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
?>

#4. Source: Pyro: http://www.webdeveloper.com/forum/archive/index.php/t-11290.html

Validating Function:

<?php
function validateURL($url){
$regex = "/^(http(s?)://|ftp://{1})((w+.){1,})w{2,}$/i";  
if(preg_match($regex, $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
?>

#5. Source: http://php.net/, filter validate URL function: http://php.net/manual/en/filter.filters.validate.php

Validating Function:

<?php
function validateURL($url){
if(filter_var($url, FILTER_VALIDATE_URL))

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
?>

#6. Source: http://stackoverflow.com/questions/206059/php-validation-regex-for-url

<?php
function validateURL($url){
$regex = "#((http|https|ftp)://(S*?.S*?))(s|;|)|]|[|{|}|,|"|'|:|<|$|.s)#ie";  
if(preg_match($regex, $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".'<font color="blue">Valid URL</font>'; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".'<font color="red">Invalid URL</font>';
}
}
?>

The URLs to be Tested

Below are the URLs to be tested against the above URL validating functions.

Acceptable URLs (one URL per line):

http://www.example.org
http://www.example.org/
www.example.org/
www.example.org
example.org
subdomain.example.org
https://example.org
https://example.subdomain.org
http://www.example.com/discus/messages/131/24297.html
http://php.net/manual/en/function.preg-match.php
http://www.lgts.com/catalog/product_info.php?products_id=6&osCsid=be3
http://192.168.1.1
http://192.168.1.1/
http://64.233.167.99/
http://siteexplorer.search.yahoo.com/search?p=www.x.com&bwmo=d&bwmf=u
http://www.example.com/2009/01/08/rfc-example-url-validation
http://mp3hungama.com/music/genre_albums.php?id=3
www.enfocus.com/product.php?id=855
http://www.example.com/space%20here.html
http://www.example.com/space here.html
osdir.com/ml/unassigned-bugs/2010-04/msg00162.html
forums.asp.net/p/1157859/1905808.aspx
http://forums.asp.net/p/1157859/1905808.aspx
https://forums.asp.net/p/1157859/1905808.aspx
http://url.com/?source=rss_feed
https://www.sound.com/catalog/account.php?osCsid=07b6922f54ed9674582
https://www.thehayexperts.co.uk/index.php?osCsid=xlo8u8nl8m4t725
ftp://example.com
http://example.com/index.asp
http://www.smallnetbuilder.com/component/option,com_chart/Itemid,189/
http://example.org:80
ftp://asmith@ftp.example.org
HTTP://EN.EXAMPLE.ORG/
HTTP://EXAMPLE.ORG/
HTTP://WWW.EXAMPLE.ORG
http://example.com/redirect?url=http%3A%2F%2Fplanio.com
http://www.example.com:8080
www.linux-rules-the-world.com
http://www.google.com/company_secrets.htm
http://askville.amazon.com/phones/AnswerViewer.do?requestId=7665185

Malformed URLs(one URL per line):

C:forums.asp.net/p/1157859/1905808.aspx
dfds://example.com
http://www.example.com.
http://www.example.com/.
http://.example.com
http://example/
http:///example.com
http://www
htp://www.google.com
http//www.google.com
http://example.com/index.php//
http://example.com//
/newfaq/basic/url.html
http://www.example.commain.html
example.123
http://username:password@hostname/path?arg=value#anchor
.example.
example
http://-example.com
http://example-.com
//example.com.
http://www_google_com
http://www-google-com
http:forums.asp.net/p/1157859/1905808.aspx
http://somedomain.com/ind%ex.html
http://....../path/?query#fragment
http://...../

{mospagebreak title=The Test PHP Script}

To test the above functions properly, we need a PHP script that will serve as a vehicle for testing. The six functions feature fairly similar testing scripts, such as the one shown below.

To test choice #1:

<?php
function validateURL($url){
$regex = "((https?|ftp)://)?";  
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; 
$regex .= "([a-z0-9-.]*).([a-z]{2,3})"; 
$regex .= "(:[0-9]{2,5})?";  
$regex .= "(/([a-z0-9+$_-].?)+)*/?";  
$regex .= "(?[a-z+&$_.-][a-z0-9;:@&%=+/$_.-]*)?";  
$regex .= "(#[a-z_.-][a-z0-9+$_.-]*)?";  
if(preg_match("/^$regex$/", $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="blue">Valid URL</font>’; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".’<font color="red">Invalid URL</font>’;
}
}
$a=file(‘data.txt’, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
while (list($key,$value) = each($a)) {
$countme=$key+1;
echo $countme.’.)’.'&nbsp;’;
echo validateURL($value);
echo ‘<br />’;
}
?>

Discussion of testing script

First, the validating function is added to the top of the PHP script. Then all of the URLs to be tested are placed in an external text file (data.txt), which is placed in the same directory as the validating script. This is basically how the URLs to be tested are arranged inside data.txt:

Then this line:

$a=file(‘data.txt’, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

Grabs the contents of the text file (which are actually the URLs listed previously) and puts it in an array variable $a.

Once all of the contents of the text file are in the array, this loop:

while (list($key,$value) = each($a)) {
$countme=$key+1;
echo $countme.’.)’.'&nbsp;’;
echo validateURL($value);
echo ‘<br />’;
}

Will basically read each URL found in the array, and then validate it against the validating function. The output will then be provided to the web browser as either a valid or an invalid URL.

If you want to add more URLs to test, you can simply add them to the data.txt. You can download the URL validation script here for reference and for your own use: http://www.php-developer.org/wp-content/uploads/tutorials/urlvalidation.zip

Unzip it and run it in your XAMPP localhost: http://localhost/urlvalidation/

Analysis and Discussion of Results

You can also view the results here online: http://www.php-developer.org/urlvalidation/

This is how the results will look, for example, for regex #1: http://www.php-developer.org/urlvalidation/1.php

Let’s discuss how you are going to analyze each one of these validating functions. First, the URLs numbered 41 to 67 are malformed URLs, and this regex fail to detect 3 malformed URLs (instead of marking them ”invalid URL,” they are marked “valid URL.”). So the %slippage can be computed:

%slippage=3/27 =11.11% (the lower percentage the better

The URLs numbered 1 to 40 above are acceptable URLs. This validating function over-judged 14 URLs (instead of marking the URL as valid, it is marked invalid). Therefore, %overjudgement= 14/40= 35% (the lower percentage the better).

Finally, if you need to measure the overall performance of the validating function, you can simply average both %slippage and %overjudgement.

Below is the summary for the rest of the validating functions:

Based on the result, the first validating function (#1) produces the lowest average in terms of %slippage and %overjudgement. Validating function #4 does have zero slippage, but it over-judges 87.5% of the time. So the average % is not good. The filter_validate_URL function, which is the #5 validating function, averages 31.62%. It does have a low %overjudgement, but the problem is its high %slippage.

You can also say that there is no perfect validating function, which has zero %slippage and zero %overjudgement. Each of these functions has its own strengths and weaknesses.

Recommendations

So what is the recommended validating function? Based on the evaluation result, you can select validating function #1 since the risk of %slippage and %overjudgement is minimal. There might still be a lot of functions that can be developed that are not featured here and are more accurate than this function, but so far this function ranks well in Google search. Feel free to test your own function and post the results here if you have a much lower %average. It will be beneficial to everyone.

So how can you increase the accuracy of validation? Below are some further recommendations on how you can further increase the accuracy of your URL validation.

First, you can blend the validating functions. Instead of relying only on a single function, you can incorporate other features from other functions, or some additional functionality to make it stronger. For example, validation function #1′s main weakness is that it will not be able to validate valid capitalized URLs or IP addresses.

One way you can further improve the function is to convert all URLs to lower case before inputting them into the validating function. You can also let filter_validate_URL (validating function #5) handle the validation if the URL contains numbers, which means it is an IP address. The PHP built-in function filter_validate_url seems to be accurate when validating IP addresses as part of the URL.

Second, in PHP, there is a function called parse_url: http://php.net/manual/en/function.parse-url.php. You can feed it the output of validated URLs to retrieve the host name and other information for further processing, instead of using string manipulation functions; it saves time.

To decrease the possibility of overjudgement, you can also feed the invalid URLs for the first validating function to the parse_url function to see if it can retrieve the domain name. You can then check to see if it exists. You’ll find some script here: http://psoug.org/snippet/Check_If_Domain_Exists_31.htm

Finally, one other thing you can do is incorporate the file_exist PHP function: http://php.net/manual/en/function.file-exists.php.

You can use this function to check if the URL actually exists. It will then provide information, such as the header status of the URL. For example, if it is 404, then the URL does not exist. See the tip provided by vernon at kesnerdesigns.net here: http://php.net/manual/en/function.file-exists.php  

As a summary, relying on a validating function alone to verify the integrity of a URL is not a complete solution; instead, other, related functions need to be added to complete the checking.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan