Administration
  Home arrow Administration arrow Page 5 - Site Search with HTDIG
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ADMINISTRATION

Site Search with HTDIG
By: icarus, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 17
    2004-04-12

    Table of Contents:
  • Site Search with HTDIG
  • Digging Deep
  • Source Control
  • Script Barf
  • Variable Control
  • A Well-Formed Plan
  • What You See
  • Custom Job
  • Out With The Old
  • Caveat Emptor
  • Ending The Dig

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Dell PowerEdge Servers

    Site Search with HTDIG - Variable Control
    (Page 5 of 11 )

    You can also alter a number of other variables that control ht://Dig behaviour through the configuration file. Amongst other things, you can modify the location for the search database, specify a list of URLs and extensions to be bypassed while indexing, enable or disable the fuzzy logic algorithms, limit the amount of content stored in the search database and control the maximum amount of data read over an HTTP connection.

    The next step is to actually build the search database. As noted previously, when indexing a Web site, ht://Dig recursively spiders the site(s) and builds an index of all the unique words it finds. This process is activated via the "rundig" script, found in the installation's "bin" directory:

     
    $ /usr/local/htdig/bin/rundig 
    New server
    localhost80 
    0
    :0:0:http://localhost/: +* size = 487 
    1:1:1:http://localhost/company/: -+++* size = 2867 
    2:2:2:http://localhost/services/: -***+++++- size = 5219 
    ... 
    htmerge: Sorting... 
    htmerge: Merging... 
    htmerge: 100:creative 
    htmerge: 200:good 
    htmerge: 300:online 
    htmerge: 400:specifically 
    ... 
    htfuzzy/endings: words: 13200 
    htfuzzy/endings 
    htfuzzy/synonyms: 1519 worshipping 
    htfuzzy/synonyms: Done. 
    htfuzzy: Done. 

    The "rundig" script looks up the configuration file to figure out which URL to use as the root for indexing, and begins traversing and scanning the pages under that URL.

    Once it's done, the search database will have been created (in the installation's "db" directory) and is ready for use. The next step is to integrate the ht://Dig search form and form processor into the Web site.

    More Administration Articles
    More By icarus, (c) Melonfire


     

       

    ADMINISTRATION ARTICLES

    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization
    - Advanced Concepts on Dealing with Files and ...
    - Dealing with Files and Filesystems
    - More Hacks for the User Environment in BSD
    - Personalizing the User Environment in BSD
    - Customizing the User Environment in BSD

     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway