To test the above functions properly, we need a PHP script that will serve as a vehicle for testing. The six functions feature fairly similar testing scripts, such as the one shown below. To test choice #1: <?php Discussion of testing script First, the validating function is added to the top of the PHP script. Then all of the URLs to be tested are placed in an external text file (data.txt), which is placed in the same directory as the validating script. This is basically how the URLs to be tested are arranged inside data.txt:
Then this line: $a=file('data.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); Grabs the contents of the text file (which are actually the URLs listed previously) and puts it in an array variable $a. Once all of the contents of the text file are in the array, this loop: while (list($key,$value) = each($a)) { Will basically read each URL found in the array, and then validate it against the validating function. The output will then be provided to the web browser as either a valid or an invalid URL. If you want to add more URLs to test, you can simply add them to the data.txt. You can download the URL validation script here for reference and for your own use: http://www.php-developer.org/wp-content/uploads/tutorials/urlvalidation.zip Unzip it and run it in your XAMPP localhost: http://localhost/urlvalidation/ Analysis and Discussion of Results You can also view the results here online: http://www.php-developer.org/urlvalidation/ This is how the results will look, for example, for regex #1: http://www.php-developer.org/urlvalidation/1.php
Let's discuss how you are going to analyze each one of these validating functions. First, the URLs numbered 41 to 67 are malformed URLs, and this regex fail to detect 3 malformed URLs (instead of marking them "invalid URL," they are marked "valid URL."). So the %slippage can be computed: %slippage=3/27 =11.11% (the lower percentage the better The URLs numbered 1 to 40 above are acceptable URLs. This validating function over-judged 14 URLs (instead of marking the URL as valid, it is marked invalid). Therefore, %overjudgement= 14/40= 35% (the lower percentage the better). Finally, if you need to measure the overall performance of the validating function, you can simply average both %slippage and %overjudgement. Below is the summary for the rest of the validating functions:
Based on the result, the first validating function (#1) produces the lowest average in terms of %slippage and %overjudgement. Validating function #4 does have zero slippage, but it over-judges 87.5% of the time. So the average % is not good. The filter_validate_URL function, which is the #5 validating function, averages 31.62%. It does have a low %overjudgement, but the problem is its high %slippage. You can also say that there is no perfect validating function, which has zero %slippage and zero %overjudgement. Each of these functions has its own strengths and weaknesses. Recommendations So what is the recommended validating function? Based on the evaluation result, you can select validating function #1 since the risk of %slippage and %overjudgement is minimal. There might still be a lot of functions that can be developed that are not featured here and are more accurate than this function, but so far this function ranks well in Google search. Feel free to test your own function and post the results here if you have a much lower %average. It will be beneficial to everyone. So how can you increase the accuracy of validation? Below are some further recommendations on how you can further increase the accuracy of your URL validation. First, you can blend the validating functions. Instead of relying only on a single function, you can incorporate other features from other functions, or some additional functionality to make it stronger. For example, validation function #1's main weakness is that it will not be able to validate valid capitalized URLs or IP addresses. One way you can further improve the function is to convert all URLs to lower case before inputting them into the validating function. You can also let filter_validate_URL (validating function #5) handle the validation if the URL contains numbers, which means it is an IP address. The PHP built-in function filter_validate_url seems to be accurate when validating IP addresses as part of the URL. Second, in PHP, there is a function called parse_url: http://php.net/manual/en/function.parse-url.php. You can feed it the output of validated URLs to retrieve the host name and other information for further processing, instead of using string manipulation functions; it saves time. To decrease the possibility of overjudgement, you can also feed the invalid URLs for the first validating function to the parse_url function to see if it can retrieve the domain name. You can then check to see if it exists. You'll find some script here: http://psoug.org/snippet/Check_If_Domain_Exists_31.htm Finally, one other thing you can do is incorporate the file_exist PHP function: http://php.net/manual/en/function.file-exists.php. You can use this function to check if the URL actually exists. It will then provide information, such as the header status of the URL. For example, if it is 404, then the URL does not exist. See the tip provided by vernon at kesnerdesigns.net here: http://php.net/manual/en/function.file-exists.php As a summary, relying on a validating function alone to verify the integrity of a URL is not a complete solution; instead, other, related functions need to be added to complete the checking.
blog comments powered by Disqus |
|
|
|
|
|
|
|