Want to add a search engine to your Web site but don't know how? Well, today's your lucky day! In this tutorial, find out how to obtain, install and use the popular ht://Dig indexing engine to add powerful, effective search capabilities to your site with minimal time and fuss.
Thus far, the previous examples have assumed a Web site consisting of static HTML pages as the base for ht://Dig's indexing routines. But in today's interactive Web, such Web sites are far less common than database-backed, highly-interactive and content-rich portals. How does ht://Dig do when faced with one of these?
The answer, not surprisingly, is quite well. You don't need to do anything special to get ht://Dig to index a database-driven site - simply give it the starting URL as usual, and the program will take care of traversing the dynamically-generated content and building an index.
One thing to remember here, however, is that since such sites change frequently, it's a good idea to recreate the ht://Dig database on a periodic basis to ensure that the changes are reflected in the search database, and to ensure that users always get the most accurate results from the system. This can easily be accomplished by adding a "cron" job to execute the "rundig" script on a periodic basis - perhaps once every day around midnight, so that users aren't impacted too much by the temporary performance drag as the index is regenerated.
Previous examples have also assumed that ht://Dig was being used to index a single site. If you'd like to index multiple sites, the ht://Dig FAQ suggests two ways to accomplish this. Door #1 involves indexing everything into a single database, and then using "restrict" and "exclude" parameters in the search form to constrain searches on a per-site basis. Door #2 involves creating separate databases for each site (through separate configuration files) and telling "htsearch" which configuration file (and hence which database to look in) through the "config" parameter in the search form. Either way, when dealing with such sites, it's also a good idea to configure ht://Dig to archive smaller descriptions for each page, so as to reduce the disk space taken up by the search database. See the ht://Dig online FAQ for more information on how to do this.