PHP
  Home arrow PHP arrow Page 3 - Search This!
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PHP

Search This!
By: Colin Viebrock
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 24
    1999-03-15


    Table of Contents:
  • Search This!
  • Configuring ht://Dig
  • Indexing the Site
  • Building the Search Page
  • Performing the Search
  • Displaying the Results

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Search This! - Indexing the Site
    ( Page 3 of 6 )

    Before ht://Dig can search your site, it has to index it.

    ht://Dig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search them. In this way, it works very much like a search robot or web spider.

    [Note: ht://Dig can also operate locally, indexing a site through the local file system. While this is a faster way of indexing a site, it doesn't work very well when indexing dynamic pages. Why? Well, it would indexing the source of a PHP file, not the result. And that's not what you want.]

    Every time you make a change to your site, you'll want to re-index it. So it's probably a good idea to write a little shell script that indexes your site for you, and add it to your crontab. Here is rundig.sh, a script that does just that for the SummerWorks site, and emails me the details. The changes you need to make for your site should be obvious.


    #! /bin/sh if [ "$1" = "-v" ]; then verbose="-v" fi # This is the directory where htdig lives BASEDIR=/usr/local/htdig # This is the db dir DBDIR=$BASEDIR/db/sw98 # This is the directory htdig will use for temporary sort files TMPDIR=/tmp export TMPDIR # This is the name of a temporary report file REPORT=$TMPDIR/htdig.sw98 # This is who gets the report REPORT_DEST="you@your-email-address.com" export REPORT_DEST # This is the subject line of the report SUBJECT="ht://Dig Report for SW98" # This is the name of the conf file to use CONF=sw98.conf # This is the PATH used by this script. Change it if you have problems # with not finding wc or grep. PATH=/usr/local/bin:/usr/bin:/bin ##### Dig phase STARTTIME=`date` echo Start time: $STARTTIME echo rundig: Start time: $STARTTIME > $REPORT $BASEDIR/bin/htdig $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT TIME=`date` echo Done Digging: $TIME echo rundig: Done Digging: $TIME >> $REPORT ##### Merge Phase $BASEDIR/bin/htmerge $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT TIME=`date` echo Done Merging: $TIME echo rundig: Done Merging: $TIME >> $REPORT ##### Cleanup Phase # To enable htnotify, uncomment the following line # $BASEDIR/bin/htnotify $verbose >>$REPORT # To enable the soundex or endings search, uncomment the following line $BASEDIR/bin/htfuzzy $verbose -c $BASEDIR/conf/$CONF endings # Move the work files mv $DBDIR/db.wordlist.work $DBDIR/db.wordlist mv $DBDIR/db.docdb.work $DBDIR/db.docdb mv $DBDIR/db.docs.index.work $DBDIR/db.docs.index mv $DBDIR/db.words.db.work $DBDIR/db.words.db END=`date` echo End time: $END echo rundig: End time: $END >> $REPORT echo # Grab the important statistics from the report file # All lines begin with htdig: or htmerge: or rundig: fgrep "htdig:" $REPORT echo fgrep "htmerge:" $REPORT echo fgrep "rundig:" $REPORT echo WC=`wc -l $REPORT` echo Total lines in $REPORT: $WC # Send out the report ... mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT # ... and clean up rm $REPORT

    Run this from the command line with the -v switch and you can watch as your site is indexed! You'll need to run this as root (or the same user you installed ht://Dig as) so that it can create the necessary files in /usr/local/htdig/db.



     
     
    >>> More PHP Articles          >>> More By Colin Viebrock
     

       

    PHP ARTICLES

    - Building Dynamic Queries with Chainable Meth...
    - PHP Encryption and Decryption Methods
    - Building a MySQL Abstraction Class with Meth...
    - Completing a Sample String Processor with Me...
    - Mastering WHILE Loops for PHP and MySQL
    - Method Chaining: Adding More Methods to the ...
    - Method Chaining in PHP 5
    - The Role of Interfaces in Applying the Depen...
    - Dependency Injection: Using a Setter Method ...
    - Using a Model Class with the Dependency Inje...
    - Injecting Objects Using Setter Methods with ...
    - Injecting Objects by Constructor with the De...
    - The Dependency Injection Design Pattern in P...
    - Performing Inferential Statistical Analysis ...
    - Performing Descriptive Statistical Analysis ...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    Stay green...Green IT