Administration
  Home arrow Administration arrow Page 10 - Site Search with HTDIG
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
Google.com  
ADMINISTRATION

Site Search with HTDIG
By: icarus, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 20
    2004-04-12


    Table of Contents:
  • Site Search with HTDIG
  • Digging Deep
  • Source Control
  • Script Barf
  • Variable Control
  • A Well-Formed Plan
  • What You See
  • Custom Job
  • Out With The Old
  • Caveat Emptor
  • Ending The Dig

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Site Search with HTDIG - Caveat Emptor
    ( Page 10 of 11 )

    Thus far, the previous examples have assumed a Web site consisting of static HTML pages as the base for ht://Dig's indexing routines. But in today's interactive Web, such Web sites are far less common than database-backed, highly-interactive and content-rich portals. How does ht://Dig do when faced with one of these?

    The answer, not surprisingly, is quite well. You don't need to do anything special to get ht://Dig to index a database-driven site - simply give it the starting URL as usual, and the program will take care of traversing the dynamically-generated content and building an index.

    One thing to remember here, however, is that since such sites change frequently, it's a good idea to recreate the ht://Dig database on a periodic basis to ensure that the changes are reflected in the search database, and to ensure that users always get the most accurate results from the system. This can easily be accomplished by adding a "cron" job to execute the "rundig" script on a periodic basis - perhaps once every day around midnight, so that users aren't impacted too much by the temporary performance drag as the index is regenerated.

    Previous examples have also assumed that ht://Dig was being used to index a single site. If you'd like to index multiple sites, the ht://Dig FAQ suggests two ways to accomplish this. Door #1 involves indexing everything into a single database, and then using "restrict" and "exclude" parameters in the search form to constrain searches on a per-site basis. Door #2 involves creating separate databases for each site (through separate configuration files) and telling "htsearch" which configuration file (and hence which database to look in) through the "config" parameter in the search form. Either way, when dealing with such sites, it's also a good idea to configure ht://Dig to archive smaller descriptions for each page, so as to reduce the disk space taken up by the search database. See the ht://Dig online FAQ for more information on how to do this.



     
     
    >>> More Administration Articles          >>> More By icarus, (c) Melonfire
     

       

    ADMINISTRATION ARTICLES

    - Network Booting via PXE: the Basics
    - Scalix: Linux Administrator`s Guide
    - Network Administration with FreeBSD 7
    - Components of an Information Architecture
    - The Anatomy of an Information Architecture
    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek