Home arrow PHP arrow Page 2 - PHP URL Validation Functions

The Test PHP Script - PHP

Validating URLs is important to form handling and PHP data processing. Currently there are numerous solutions for validating URLs. This article will take a look at some of the most commonly used methods of validating URLs in PHP: the Regex method and the PHP built-in Filter_validate_URL.

TABLE OF CONTENTS:
  1. PHP URL Validation Functions
  2. The Test PHP Script
By: Codex-M
Rating: starstarstarstarstar / 6
March 02, 2011

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

To test the above functions properly, we need a PHP script that will serve as a vehicle for testing. The six functions feature fairly similar testing scripts, such as the one shown below.

To test choice #1:

<?php
function validateURL($url){
$regex = "((https?|ftp)://)?";  
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; 
$regex .= "([a-z0-9-.]*).([a-z]{2,3})"; 
$regex .= "(:[0-9]{2,5})?";  
$regex .= "(/([a-z0-9+$_-].?)+)*/?";  
$regex .= "(?[a-z+&$_.-][a-z0-9;:@&%=+/$_.-]*)?";  
$regex .= "(#[a-z_.-][a-z0-9+$_.-]*)?";  
if(preg_match("/^$regex$/", $url)) 

echo $url."&nbsp;&nbsp;&nbsp;=".'<font color="blue">Valid URL</font>'; 

else {
echo $url."&nbsp;&nbsp;&nbsp;=".'<font color="red">Invalid URL</font>';
}
}
$a=file('data.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
while (list($key,$value) = each($a)) {
$countme=$key+1;
echo $countme.'.)'.'&nbsp;';
echo validateURL($value);
echo '<br />';
}
?>

Discussion of testing script

First, the validating function is added to the top of the PHP script. Then all of the URLs to be tested are placed in an external text file (data.txt), which is placed in the same directory as the validating script. This is basically how the URLs to be tested are arranged inside data.txt:

Then this line:

$a=file('data.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

Grabs the contents of the text file (which are actually the URLs listed previously) and puts it in an array variable $a.

Once all of the contents of the text file are in the array, this loop:

while (list($key,$value) = each($a)) {
$countme=$key+1;
echo $countme.'.)'.'&nbsp;';
echo validateURL($value);
echo '<br />';
}

Will basically read each URL found in the array, and then validate it against the validating function. The output will then be provided to the web browser as either a valid or an invalid URL.

If you want to add more URLs to test, you can simply add them to the data.txt. You can download the URL validation script here for reference and for your own use: http://www.php-developer.org/wp-content/uploads/tutorials/urlvalidation.zip

Unzip it and run it in your XAMPP localhost: http://localhost/urlvalidation/

Analysis and Discussion of Results

You can also view the results here online: http://www.php-developer.org/urlvalidation/

This is how the results will look, for example, for regex #1: http://www.php-developer.org/urlvalidation/1.php

Let's discuss how you are going to analyze each one of these validating functions. First, the URLs numbered 41 to 67 are malformed URLs, and this regex fail to detect 3 malformed URLs (instead of marking them "invalid URL," they are marked "valid URL."). So the %slippage can be computed:

%slippage=3/27 =11.11% (the lower percentage the better

The URLs numbered 1 to 40 above are acceptable URLs. This validating function over-judged 14 URLs (instead of marking the URL as valid, it is marked invalid). Therefore, %overjudgement= 14/40= 35% (the lower percentage the better).

Finally, if you need to measure the overall performance of the validating function, you can simply average both %slippage and %overjudgement.

Below is the summary for the rest of the validating functions:

Based on the result, the first validating function (#1) produces the lowest average in terms of %slippage and %overjudgement. Validating function #4 does have zero slippage, but it over-judges 87.5% of the time. So the average % is not good. The filter_validate_URL function, which is the #5 validating function, averages 31.62%. It does have a low %overjudgement, but the problem is its high %slippage.

You can also say that there is no perfect validating function, which has zero %slippage and zero %overjudgement. Each of these functions has its own strengths and weaknesses.

Recommendations

So what is the recommended validating function? Based on the evaluation result, you can select validating function #1 since the risk of %slippage and %overjudgement is minimal. There might still be a lot of functions that can be developed that are not featured here and are more accurate than this function, but so far this function ranks well in Google search. Feel free to test your own function and post the results here if you have a much lower %average. It will be beneficial to everyone.

So how can you increase the accuracy of validation? Below are some further recommendations on how you can further increase the accuracy of your URL validation.

First, you can blend the validating functions. Instead of relying only on a single function, you can incorporate other features from other functions, or some additional functionality to make it stronger. For example, validation function #1's main weakness is that it will not be able to validate valid capitalized URLs or IP addresses.

One way you can further improve the function is to convert all URLs to lower case before inputting them into the validating function. You can also let filter_validate_URL (validating function #5) handle the validation if the URL contains numbers, which means it is an IP address. The PHP built-in function filter_validate_url seems to be accurate when validating IP addresses as part of the URL.

Second, in PHP, there is a function called parse_url: http://php.net/manual/en/function.parse-url.php. You can feed it the output of validated URLs to retrieve the host name and other information for further processing, instead of using string manipulation functions; it saves time.

To decrease the possibility of overjudgement, you can also feed the invalid URLs for the first validating function to the parse_url function to see if it can retrieve the domain name. You can then check to see if it exists. You'll find some script here: http://psoug.org/snippet/Check_If_Domain_Exists_31.htm

Finally, one other thing you can do is incorporate the file_exist PHP function: http://php.net/manual/en/function.file-exists.php.

You can use this function to check if the URL actually exists. It will then provide information, such as the header status of the URL. For example, if it is 404, then the URL does not exist. See the tip provided by vernon at kesnerdesigns.net here: http://php.net/manual/en/function.file-exists.php  

As a summary, relying on a validating function alone to verify the integrity of a URL is not a complete solution; instead, other, related functions need to be added to complete the checking.



 
 
>>> More PHP Articles          >>> More By Codex-M
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: