Ever wondered if you could be emailed automatically whenever yourfavorite Web pages changed? Our intrepid developer didn't just wonder -he sat down and wrote some code to make it happen. Here's his story.
The first step in my script is to connect to the MySQL database and run a query to get a list of all the URLS to be checked.
// open database connection
$connection = mysql_connect($db_host, $db_user, $db_pass) or die
("Unable to connect!"); mysql_select_db($db_name);
// generate and execute query
$query1 = "SELECT id, url, date, dsc, email FROM urls2"; $result1 =
mysql_query($query1, $connection) or die ("Error in query: $query1 . " .
mysql_error());
Assuming the query returns one or more rows,
the next step is to iterate through the resultset and process each record:
// if rows exist
if (mysql_num_rows($result1) > 0)
{
// iterate through resultset
while(list($id, $url, $date, $desc, $email) =
mysql_fetch_row($result1))
{
// processing code here
}
}
For each URL found, I need to extract the host name and the
file path on the server - this is extremely easy with PHP's very cool parse_url() function, which returns an associative array containing the various constituent elements of the URL.
// parse URL into component parts
$arr = parse_url($url);
This data can then be used to open a socket connection to the
host Web server, send an HTTP HEAD request, and place the response in a PHP variable.
// open a client connection
$fp = fsockopen ($arr['host'], 80);
// send HEAD request and read response
$request = "HEAD /" . $arr['path'] . " HTTP/1.0\r\n\r\n";
fputs ($fp, $request);
while (!feof($fp))
{
$response .= fgets ($fp, 500);
}
fclose ($fp);
This response is then broken up into individual lines, and
each line is scanned for the "Last-Modified" header - note my use of the ereg() function to accomplish this task. The corresponding date is then converted into a UNIX-compliant timestamp with the strtotime() function, and that timestamp is again converted into a MySQL-compliant DATETIME data type, suitable for entry into the MySQL table.
// split response into lines
$lines = explode("\r\n", $response);
// scan lines for "Last-Modified" header
foreach($lines as $l)
{
if (ereg("^Last-Modified:", $l))
{
// split into variable-value component
$arr2 = explode(": ", $l);
$newDate = gmdate("Y-m-d H:i:s", strtotime($arr2[1]));
// snip
}
}
The date retrieved from the "Last-Modified" HTTP header is
then compared with the date previously recorded for that URL in the database. If the dates are the same, it implies that the page located at that URL has not been modified since it was last checked. If they're different, it implies that a change has taken place and the user should be alerted to it. The database also needs to be updated with the new modification date, so as to provide an accurate benchmark for the next run of the script.
// if date has changed from last-recorded date
if ($date != $newDate)
{
// send mail to owner
mail($email, "$desc has changed!", "This is an automated message
to inform you that the URL \r\n\r\n $url \r\n\r\nhas changed since it
was last checked. Please visit the URL to view the changes.", "From: The
Web Watcher
<nobody@some.domain>") or die ("Could not send mail!");
// update table with new date
$query2 = "UPDATE urls SET date = '" . $newDate . "' WHERE id =
'" . $id . "'";
$result2 = mysql_query($query2, $connection) or die ("Error in
query: $query2 . " . mysql_error());
}
It might look complicated - but it's actually pretty
straightforward. Will it work?