Determine Link Relevance and Unique Class C IP using Yahoo Links API

The Yahoo Site Explorer Inbound Links API comes in handy for a variety of purposes. You can get all sorts of data from it using PHP, from counting your backlinks to analyzing where they’re coming from. This article will show you how to build a backlink checker tool that can do this and more.

This is the continuation of a series on the Yahoo Site Explorer Links API tutorial in PHP. It is suggested that you will read and understand the following tutorials if you are new to Yahoo Links API:

Using the Yahoo Site Explorer Inbound Links API 

Getting Data from Yahoo Site Explorer Inbound Links API using PHP 

Analyzing the Back Link Count from Unique Domains using Yahoo Inbound Links API 

Building a Form to Count Back Links Using Yahoo Inbound Links API 

Count Backlinks from Unique Domains Using Yahoo Inbound Links API 

In those five articles that are part of the series, you will not find a discussion of how to make a backlink checker tool that will also analyze the inbound link relevance and check if those unique domains belongs to unique class C IP address.

This will be covered deeply in this tutorial.

{mospagebreak title=How to Sort Out Unique Class C IP Addresses

The original source code of the above tutorial series is here: http://www.php-developer.org/wp-content/uploads/scripts/countuniquelinks.zip

It is not capable of sorting out unique class C IP addresses. However, since the script is capable of extracting the domain name, you can get the IP address of the website server using the gethostbyname() function in PHP.

Using string manipulation functions, you can arrive at the Class C IP address. The following script is added to the original code as follows:

//Start of Unique Class C IP Address Script
//Define unique IP address array


$iparray = array();


//Loop through the $uniquedomains array to extract domain names

while (list($key, $value) = each($uniquedomains)) {


//Get IP address of the extracted domains

$ipaddress=gethostbyname($value);


//Do string manipulation to extract the Class C IP address

$findme   = ‘.';

$lastdot= strlen(strrchr($ipaddress,$findme));

$filterclassc= (-1) * $lastdot;

$classcip = substr($ipaddress, 0, $filterclassc);

$classcip = trim($classcip);


//Assign the Class C IP address to $iparray array

$iparray[]=$classcip;


//Output the domain names to the user

echo $value;

echo ‘<br />';

}

//Extract the Unique Class C IP address in the $iparray

$arrayipunique=array_unique($iparray);


//Determine the number of unique Class C IP backlinks

$uniqueip = sizeof($arrayipunique);

echo ‘<br />';


//Compute statistics and output the results to the user

echo "In those $uniqueinsample unique domains, there are $uniqueip domains in unique Class C IP.<br />";

$uniquebacklinksuniqueclassc= ($uniqueip/$count)*($totalbacklinksnotunique);

echo "<b>ESTIMATED TOTAL BACKLINKS FROM UNIQUE DOMAINS IN UNIQUE CLASS IP POINTING TO $conditions:&nbsp;".round($uniquebacklinksuniqueclassc).'</b>';

echo ‘<br />';

echo ‘<br />';

//End of Unique Class C IP address script

IMPORTANT NOTE: Do not worry about the complete and final source code of this tutorial, it will be provided at the end of this tutorial along with a link.

{mospagebreak title=Determine the Quality and Relevance of Inbound Links}

This is trickier to do. However, the strategy is simple. Yahoo Links API provides a title tag of the inbound link pages. Since a title tag basically tells us what the page is all about, you can analyze the keywords contained in the title tag and then return statistics to the user to provide information on whether the inbound links to the website are related. The code to be added to original script is as follows:

//Extract the title tag from the Yahoo Links API array
//This is placed below the previous code to extract the URL which is:
//$myurl= $display1[$x]['Url'];

$mytitletag = $display1[$x]['Title'];

//You also need to assign the title tag to an array.
//This is placed at the last section just below this line:
//$domainarray[]=$domain; and before $x++
//This $titlearray[] contains all the title tag of all inbound link
//pages to the website

$titlearray[] = $mytitletag;

//Now since there are inbound link pages that comes from the same //domain, hence also contains the same title tag, you need to filter
//unique title tag from the array. This code is placed just outside the //while loop: while ($x < $count) {
//}
//And just before:
//$uniquedomains=array_unique($domainarray);

$uniquetitletags = array_unique($titlearray);

//Start of link relevance analysis

//Reset array

reset($uniquedomains);
echo ‘—————————————————‘;

echo ‘<br />';

echo ‘|THIS IS THE LINK RELEVANCE REPORT FOR ‘.$domainurl;

echo ‘<br />';

echo ‘—————————————————‘;
echo ‘<br />';

echo ‘<br />';

//A short explanation about the link relevance analysis strategy

echo ‘The relevance of your backlinks are computed based on the title tag of your backlinking pages. These title tag are important because it tells us what the backlink page is all about. The keywords from the title tag are then extracted and analyzed.';

echo ‘<br />';

echo ‘If these keyword lists that is sorted by percentages MATCHES with your domain or website topic or niche, then congratulations; your backlinks are relevant to your website.<br />';

echo ‘<br />';


//Combine all words inside the unique title tags array to make it as
//one sentence for the analysis


$sentenceforanalysis= implode(" ",$uniquetitletags);


//Compute the keyword occurrence percentage

//Parts of the code is taken from http://bit.ly/5egil, authored by Tom
//str_word_count($str,1) – returns an array containing all the words
//found inside the string


$words = str_word_count(strtolower($sentenceforanalysis),1);


//Count the number of words

$numWords = count($words);


//array_count_values() returns an array using the values of the input
//array as keys and their frequency in input as values.


$word_count = (array_count_values($words));


//sort the results

arsort($word_count);


//stopwords PHP array by Armand Brahaj found here:
//http://bit.ly/4vNhpu
//It is important to exclude the stop words from the analysis because
//they are not vital for relevance computations
//the script for stopwordslist.php can be found here:
//http://bit.ly/910X2Q
//you need to change the path of the PHP include to reflect your own
//file path

include ‘/opt/lampp/htdocs/backlinkcount/stopwordslist.php';

//now that the stopwords array is in placed
//you need to check if the keywords in the title tag are not stopwords
//first define the $stopwordsarray which will contain the stopwords //found in the title tag keywords

$stopwordarray= array();


//next is to loop through the array

foreach ($word_count as $key=>$val) {


//to count the number of stop words, first you need to gather all the stop words found in the keyword title tags according to //stopwordslist.php
//this is done using a PHP array and assigning the stop words to
//$stopwordarray

//Also check if the word consist entirely of English alphabets, this
//will filter non-words in the title which are not important for //analysis

//Exluded also in the analysis are words that consist of less than 3
//characters which are not also important.
//if all of the above condition are true, the keywords are assigned to
//the $stopwordarray[]

if ((in_array($key,$stopwords)) || (!(ctype_alpha($key))) || (strlen($key) < 3 )) {

$stopwordarray[] = $key;

}

}


//count the stop words detected
 that contains in the $stopwordarray
//using PHP sizeof function

$stopwordcount = sizeof($stopwordarray);


//Compute the number of words without stop words
 and other excluded //words

$adjustednumWords = $numWords – $stopwordcount;


//Finally loop through the array again to display the relevance //statistics detail to the user

foreach ($word_count as $key=>$val) {

if ((!(in_array($key,$stopwords))) && (ctype_alpha($key)) && (strlen($key) > 2)) {


//it is NOT a STOP word; display these keywords to the user

//as well as the percent occurrences

    echo "<b>$key = $val</b>. Percent occurrence: ".number_format(($val/$adjustednumWords)*100)."%<br/>n";

 }

}

{mospagebreak title=Final Project Source Code and Testing}

You can download the final source code as discussed in this tutorial. 

You need to do the following before you can fully implement the script for your own personal use:

  • Get a Recaptcha public key and private key. You can get these at http://www.google.com/recaptcha
  • Get a Yahoo Links API Application ID. The procedure was already discussed in the previous tutorials.
  • Change the include path of the stop words (refer to the above script).
  • If you wish, you can even add more stop words or any words that you do not want to be part of the keyword title list. You can simply edit the stopwordslist.php file to add more array elements.

Here is a screen shot of the web application before any data processing:

Below is an example using this tool.

The live version of the tool is here: http://www.php-developer.org/backlinkcount/checkyourlinkpopularity.php

Suppose you are interested in getting all the backlinks pointing to all of the pages of http://www.devshed.com, not just the home page.

So you need to enter http://www.devshed.com under “Enter root domain URL." Do not place the trailing slash “/” at the end of the website’s URL.

Since you are interested in getting all of the backlinks pointing to all of the pages, select the option “Entire website.”

Enter the captcha at the form and click “Submit.” Wait for at most two minutes.

The first thing you should  see is this line:
ESTIMATED TOTAL BACKLINKS FROM UNIQUE DOMAINS POINTING TO ENTIRE SITE: 285006

This means that 285006 backlinks are from unique domains, but these domains NOT still checked if they belong to a unique Class C IP address.

The second bolded summary is this: ESTIMATED TOTAL BACKLINKS FROM UNIQUE DOMAINS IN UNIQUE CLASS IP POINTING TO ENTIRE SITE: 157044

So 157044 is an estimate of all domains linking to the pages of  www.devshed.com and coming from unique class C IP addresses.

Finally, one of the most important results is the “Link Relevance Report.”

The screen shot of the report above tells us that the pages linking back to  www.devshed.com pages have a high percentage of “search,” “tools,” “fedora,” “seo” and “linux” used in their title tags, which are also related to the topics discussed on Dev Shed.

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye