PHP
  Home arrow PHP arrow Page 3 - Watching The Web
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PHP

Watching The Web
By: The Disenchanted Developer, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 7
    2002-10-23


    Table of Contents:
  • Watching The Web
  • Code Poet
  • Digging Deep
  • Backtracking
  • Plan B
  • Closing Time

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Watching The Web - Digging Deep
    ( Page 3 of 6 )

    The first step in my script is to connect to the MySQL database and run a query to get a list of all the URLS to be checked.








    // open database connection
    $connection = mysql_connect($db_host, $db_user, $db_pass) or die
    ("Unable to connect!"); mysql_select_db($db_name);
    
    // generate and execute query
    $query1 = "SELECT id, url, date, dsc, email FROM urls2"; $result1 =
    mysql_query($query1, $connection) or die ("Error in query: $query1 . " .
    mysql_error());
    Assuming the query returns one or more rows, the next step is to iterate through the resultset and process each record:
    // if rows exist
    if (mysql_num_rows($result1) > 0)
    {
    	// iterate through resultset
    	while(list($id, $url, $date, $desc, $email) =
    mysql_fetch_row($result1))
    	{
    
    	// processing code here
    
    	}
    }
    For each URL found, I need to extract the host name and the file path on the server - this is extremely easy with PHP's very cool parse_url() function, which returns an associative array containing the various constituent elements of the URL.

    // parse URL into component parts $arr = parse_url($url);
    This data can then be used to open a socket connection to the host Web server, send an HTTP HEAD request, and place the response in a PHP variable.

    // open a client connection $fp = fsockopen ($arr['host'], 80); // send HEAD request and read response $request = "HEAD /" . $arr['path'] . " HTTP/1.0\r\n\r\n"; fputs ($fp, $request); while (!feof($fp)) { $response .= fgets ($fp, 500); } fclose ($fp);
    This response is then broken up into individual lines, and each line is scanned for the "Last-Modified" header - note my use of the ereg() function to accomplish this task. The corresponding date is then converted into a UNIX-compliant timestamp with the strtotime() function, and that timestamp is again converted into a MySQL-compliant DATETIME data type, suitable for entry into the MySQL table.

    // split response into lines $lines = explode("\r\n", $response); // scan lines for "Last-Modified" header foreach($lines as $l) { if (ereg("^Last-Modified:", $l)) { // split into variable-value component $arr2 = explode(": ", $l); $newDate = gmdate("Y-m-d H:i:s", strtotime($arr2[1])); // snip } }
    The date retrieved from the "Last-Modified" HTTP header is then compared with the date previously recorded for that URL in the database. If the dates are the same, it implies that the page located at that URL has not been modified since it was last checked. If they're different, it implies that a change has taken place and the user should be alerted to it. The database also needs to be updated with the new modification date, so as to provide an accurate benchmark for the next run of the script.

    // if date has changed from last-recorded date if ($date != $newDate) { // send mail to owner mail($email, "$desc has changed!", "This is an automated message to inform you that the URL \r\n\r\n $url \r\n\r\nhas changed since it was last checked. Please visit the URL to view the changes.", "From: The Web Watcher <nobody@some.domain>") or die ("Could not send mail!"); // update table with new date $query2 = "UPDATE urls SET date = '" . $newDate . "' WHERE id = '" . $id . "'"; $result2 = mysql_query($query2, $connection) or die ("Error in query: $query2 . " . mysql_error()); }
    It might look complicated - but it's actually pretty straightforward. Will it work?

     
     
    >>> More PHP Articles          >>> More By The Disenchanted Developer, (c) Melonfire
     

       

    PHP ARTICLES

    - Building Dynamic Queries with Chainable Meth...
    - PHP Encryption and Decryption Methods
    - Building a MySQL Abstraction Class with Meth...
    - Completing a Sample String Processor with Me...
    - Mastering WHILE Loops for PHP and MySQL
    - Method Chaining: Adding More Methods to the ...
    - Method Chaining in PHP 5
    - The Role of Interfaces in Applying the Depen...
    - Dependency Injection: Using a Setter Method ...
    - Using a Model Class with the Dependency Inje...
    - Injecting Objects Using Setter Methods with ...
    - Injecting Objects by Constructor with the De...
    - The Dependency Injection Design Pattern in P...
    - Performing Inferential Statistical Analysis ...
    - Performing Descriptive Statistical Analysis ...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 4 Hosted by Hostway
    Stay green...Green IT