Home arrow PHP arrow Page 3 - Search This!

Indexing the Site - PHP

Add search capabilities to your site using the popular open source tools PHP3 and ht://Dig.

  1. Search This!
  2. Configuring ht://Dig
  3. Indexing the Site
  4. Building the Search Page
  5. Performing the Search
  6. Displaying the Results
By: Colin Viebrock
Rating: starstarstarstarstar / 24
March 15, 1999

print this article


Before ht://Dig can search your site, it has to index it.

ht://Dig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search them. In this way, it works very much like a search robot or web spider.

[Note: ht://Dig can also operate locally, indexing a site through the local file system. While this is a faster way of indexing a site, it doesn't work very well when indexing dynamic pages. Why? Well, it would indexing the source of a PHP file, not the result. And that's not what you want.]

Every time you make a change to your site, you'll want to re-index it. So it's probably a good idea to write a little shell script that indexes your site for you, and add it to your crontab. Here is rundig.sh, a script that does just that for the SummerWorks site, and emails me the details. The changes you need to make for your site should be obvious.

#! /bin/sh if [ "$1" = "-v" ]; then verbose="-v" fi # This is the directory where htdig lives BASEDIR=/usr/local/htdig # This is the db dir DBDIR=$BASEDIR/db/sw98 # This is the directory htdig will use for temporary sort files TMPDIR=/tmp export TMPDIR # This is the name of a temporary report file REPORT=$TMPDIR/htdig.sw98 # This is who gets the report REPORT_DEST="you@your-email-address.com" export REPORT_DEST # This is the subject line of the report SUBJECT="ht://Dig Report for SW98" # This is the name of the conf file to use CONF=sw98.conf # This is the PATH used by this script. Change it if you have problems # with not finding wc or grep. PATH=/usr/local/bin:/usr/bin:/bin ##### Dig phase STARTTIME=`date` echo Start time: $STARTTIME echo rundig: Start time: $STARTTIME > $REPORT $BASEDIR/bin/htdig $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT TIME=`date` echo Done Digging: $TIME echo rundig: Done Digging: $TIME >> $REPORT ##### Merge Phase $BASEDIR/bin/htmerge $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT TIME=`date` echo Done Merging: $TIME echo rundig: Done Merging: $TIME >> $REPORT ##### Cleanup Phase # To enable htnotify, uncomment the following line # $BASEDIR/bin/htnotify $verbose >>$REPORT # To enable the soundex or endings search, uncomment the following line $BASEDIR/bin/htfuzzy $verbose -c $BASEDIR/conf/$CONF endings # Move the work files mv $DBDIR/db.wordlist.work $DBDIR/db.wordlist mv $DBDIR/db.docdb.work $DBDIR/db.docdb mv $DBDIR/db.docs.index.work $DBDIR/db.docs.index mv $DBDIR/db.words.db.work $DBDIR/db.words.db END=`date` echo End time: $END echo rundig: End time: $END >> $REPORT echo # Grab the important statistics from the report file # All lines begin with htdig: or htmerge: or rundig: fgrep "htdig:" $REPORT echo fgrep "htmerge:" $REPORT echo fgrep "rundig:" $REPORT echo WC=`wc -l $REPORT` echo Total lines in $REPORT: $WC # Send out the report ... mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT # ... and clean up rm $REPORT

Run this from the command line with the -v switch and you can watch as your site is indexed! You'll need to run this as root (or the same user you installed ht://Dig as) so that it can create the necessary files in /usr/local/htdig/db.

>>> More PHP Articles          >>> More By Colin Viebrock

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates


Dev Shed Tutorial Topics: