Home arrow Site Administration arrow Page 4 - Site Search with HTDIG

Script Barf - Administration

Want to add a search engine to your Web site but don't know how? Well, today's your lucky day! In this tutorial, find out how to obtain, install and use the popular ht://Dig indexing engine to add powerful, effective search capabilities to your site with minimal time and fuss.

TABLE OF CONTENTS:
  1. Site Search with HTDIG
  2. Digging Deep
  3. Source Control
  4. Script Barf
  5. Variable Control
  6. A Well-Formed Plan
  7. What You See
  8. Custom Job
  9. Out With The Old
  10. Caveat Emptor
  11. Ending The Dig
By: icarus, (c) Melonfire
Rating: starstarstarstarstar / 21
April 12, 2004

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

In case the "configure" script barfs and spits messages at you about "installing the libstdc++ library", and if you're sure the library is already installed (the default situation if you're using GCC 3.x), you can try modifying the command above to include some additional variables:

 
cd /tmp/htdig-3.1.6 
CXXFLAGS=-Wno-deprecated CPPFLAGS=-Wno-deprecated ./configure 
--prefix=/usr/local/htdig --with-cgi-bin-dir=/usr/local/apache/cgi-bin
--with-image-dir=/usr/local/apache/htdocs/htdig/images 
--with-image-url-prefix=/htdig/images 
--with-search-dir=/usr/local/apache/htdocs/htdig/sample 

Next, compile and install it.

 
make 
make install 

ht://Dig should now have been installed to the directory "/usr/local/htdig".

You can verify this by doing a quick directory scan of that directory -
here's what you should see.

 
ls -lR /usr/local/htdig
total 16 
drwxr
-xr-x 2 root root 4096 Oct 15 18:32 bin
drwxr
-xr-x 2 root root 4096 Oct 15 18:39 common
drwxr
-xr-x 2 root root 4096 Oct 15 18:32 conf
drwxr
-xr-x 2 root root 4096 Oct 15 18:44 db/
 
/
usr/local/htdig/bin
total 2860 
-rwxr-xr-x 1 root root 580424 Oct 15 18:32 htdig
-rwxr-xr-x 1 root root 580424 Oct 15 18:32 htdump
-rwxr-xr-x 1 root root 390930 Oct 15 18:32 htfuzzy
-rwxr-xr-x 1 root root 580424 Oct 15 18:32 htload
-rwxr-xr-x 1 root root 381489 Oct 15 18:32 htmerge
-rwxr-xr-x 1 root root 376361 Oct 15 18:32 htnotify
-rwxr-xr-x 1 root root 2158 Oct 15 18:32 rundig*
 
/
usr/local/htdig/common
total 6248 
-rw-r--r-- 1 root root 84 Oct 15 18:32 bad_words 
-rw-r--r-- 1 root root 923308 Oct 15 18:32 english.0 
-rw-r--r-- 1 root root 5756 Oct 15 18:32 english.aff 
-rw-r--r-- 1 root root 197 Oct 15 18:32 footer.html 
-rw-r--r-- 1 root root 891 Oct 15 18:32 header.html 
-rw-r--r-- 1 root root 194 Oct 15 18:32 long.html 
-rw-r--r-- 1 root root 1404 Oct 15 18:32 nomatch.html 
-rw-r--r-- 1 root root 2285568 Oct 15 18:39 root2word.db 
-rw-r--r-- 1 root root 67 Oct 15 18:32 short.html 
-rw-r--r-- 1 root root 14481 Oct 15 18:32 synonyms 
-rw-r--r-- 1 root root 90112 Oct 15 18:39 synonyms.db 
-rw-r--r-- 1 root root 1275 Oct 15 18:32 syntax.html 
-rw-r--r-- 1 root root 3022848 Oct 15 18:39 word2root.db 
-rw-r--r-- 1 root root 1108 Oct 15 18:32 wrapper.html
 
/usr/local/htdig/conf
total 12 
-rw-r--r-- 1 root root 8580 Oct 15 18:42 htdig.conf
 
/usr/local/htdig/db
total 236 
-rw-r--r-- 1 root root 63488 Oct 15 18:44 db.docdb 
-rw-r--r-- 1 root root 11991 Oct 15 18:42 db.docs 
-rw-r--r-- 1 root root 5120 Oct 15 18:44 db.docs.index 
-rw-r--r-- 1 root root 54004 Oct 15 18:44 db.wordlist 
-rw-r--r-- 1 root root 82944 Oct 15 18:44 db.words.db 


The Search Binary

The search binary should have been installed to "/usr/local/apache/cgi-bin/htsearch",

 
ls -/usr/local/apache/cgi-bin 
total 560 
-rwxr-xr-x 1 root root 558796 Oct 15 18:32 htsearch
-rw-r--r-- 1 root root 268 Aug 18 16:37 printenv 
-rw-r--r-- 1 root root 757 Aug 18 16:37 test-cgi 

with a sample search form and images to "/usr/local/apache/htdocs/htdig/".

For an explanation of what each binary does, visit the ht://Dig documentation here.

Once you've got ht://Dig installed, the next step is to configure it and start indexing your site. Let's look at that next.

Building An Index

ht://Dig is configured via a single configuration file, named "htdig.conf" and located in the installation's "conf" directory. Most of the time, this configuration file is set up automatically based on the arguments you passed to the "configure" script, and only needs to be altered to reflect the URL at which indexing should begin.

Pop open this file in your favourite text editor, and look for the "start_url" variable:

 

# This specifies the URL where the robot (htdig) will start. You can specify 
# multiple URLs here. Just separate them by some whitespace. 
# The example here will cause the ht://Dig homepage and related pages to be 
# indexed. 
# You could also index all the URLs in a file like so: 
# start_url: `${common_dir}/start.url` 

start_url: http://localhost/ 

Alter this variable to reflect the URL at which indexing should begin, and save the changes back to the file.



 
 
>>> More Site Administration Articles          >>> More By icarus, (c) Melonfire
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: