Site Search with HTDIG - Variable Control (Page 5 of 11 )
You can also alter a number of other variables that control ht://Dig behaviour through the configuration file. Amongst other things, you can modify the location for the search database, specify a list of URLs and extensions to be bypassed while indexing, enable or disable the fuzzy logic algorithms, limit the amount of content stored in the search database and control the maximum amount of data read over an HTTP connection.
The next step is to actually build the search database. As noted previously, when indexing a Web site, ht://Dig recursively spiders the site(s) and builds an index of all the unique words it finds. This process is activated via the "rundig" script, found in the installation's "bin" directory:
$ /usr/local/htdig/bin/rundig
New server: localhost, 80
0:0:0:http://localhost/: +* size = 487
1:1:1:http://localhost/company/: -+++* size = 2867
2:2:2:http://localhost/services/: -***+++++- size = 5219
...
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:creative
htmerge: 200:good
htmerge: 300:online
htmerge: 400:specifically
...
htfuzzy/endings: words: 13200
htfuzzy/endings
htfuzzy/synonyms: 1519 worshipping
htfuzzy/synonyms: Done.
htfuzzy: Done.
The "rundig" script looks up the configuration file to figure out which URL to use as the root for indexing, and begins traversing and scanning the pages under that URL.
Once it's done, the search database will have been created (in the installation's "db" directory) and is ready for use. The next step is to integrate the ht://Dig search form and form processor into the Web site.
Next: A Well-Formed Plan >>
More Administration Articles
More By icarus, (c) Melonfire