Administration
  Home arrow Administration arrow Page 4 - Site Search with HTDIG
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ADMINISTRATION

Site Search with HTDIG
By: icarus, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 20
    2004-04-12


    Table of Contents:
  • Site Search with HTDIG
  • Digging Deep
  • Source Control
  • Script Barf
  • Variable Control
  • A Well-Formed Plan
  • What You See
  • Custom Job
  • Out With The Old
  • Caveat Emptor
  • Ending The Dig

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Site Search with HTDIG - Script Barf
    ( Page 4 of 11 )

    In case the "configure" script barfs and spits messages at you about "installing the libstdc++ library", and if you're sure the library is already installed (the default situation if you're using GCC 3.x), you can try modifying the command above to include some additional variables:

     
    cd /tmp/htdig-3.1.6 
    CXXFLAGS=-Wno-deprecated CPPFLAGS=-Wno-deprecated ./configure 
    --prefix=/usr/local/htdig --with-cgi-bin-dir=/usr/local/apache/cgi-bin
    --with-image-dir=/usr/local/apache/htdocs/htdig/images 
    --with-image-url-prefix=/htdig/images 
    --with-search-dir=/usr/local/apache/htdocs/htdig/sample 

    Next, compile and install it.

     
    make 
    make install 

    ht://Dig should now have been installed to the directory "/usr/local/htdig".

    You can verify this by doing a quick directory scan of that directory -
    here's what you should see.

     
    ls -lR /usr/local/htdig
    total 16 
    drwxr
    -xr-x 2 root root 4096 Oct 15 18:32 bin
    drwxr
    -xr-x 2 root root 4096 Oct 15 18:39 common
    drwxr
    -xr-x 2 root root 4096 Oct 15 18:32 conf
    drwxr
    -xr-x 2 root root 4096 Oct 15 18:44 db/
     
    /
    usr/local/htdig/bin
    total 2860 
    -rwxr-xr-x 1 root root 580424 Oct 15 18:32 htdig
    -rwxr-xr-x 1 root root 580424 Oct 15 18:32 htdump
    -rwxr-xr-x 1 root root 390930 Oct 15 18:32 htfuzzy
    -rwxr-xr-x 1 root root 580424 Oct 15 18:32 htload
    -rwxr-xr-x 1 root root 381489 Oct 15 18:32 htmerge
    -rwxr-xr-x 1 root root 376361 Oct 15 18:32 htnotify
    -rwxr-xr-x 1 root root 2158 Oct 15 18:32 rundig*
     
    /
    usr/local/htdig/common
    total 6248 
    -rw-r--r-- 1 root root 84 Oct 15 18:32 bad_words 
    -rw-r--r-- 1 root root 923308 Oct 15 18:32 english.0 
    -rw-r--r-- 1 root root 5756 Oct 15 18:32 english.aff 
    -rw-r--r-- 1 root root 197 Oct 15 18:32 footer.html 
    -rw-r--r-- 1 root root 891 Oct 15 18:32 header.html 
    -rw-r--r-- 1 root root 194 Oct 15 18:32 long.html 
    -rw-r--r-- 1 root root 1404 Oct 15 18:32 nomatch.html 
    -rw-r--r-- 1 root root 2285568 Oct 15 18:39 root2word.db 
    -rw-r--r-- 1 root root 67 Oct 15 18:32 short.html 
    -rw-r--r-- 1 root root 14481 Oct 15 18:32 synonyms 
    -rw-r--r-- 1 root root 90112 Oct 15 18:39 synonyms.db 
    -rw-r--r-- 1 root root 1275 Oct 15 18:32 syntax.html 
    -rw-r--r-- 1 root root 3022848 Oct 15 18:39 word2root.db 
    -rw-r--r-- 1 root root 1108 Oct 15 18:32 wrapper.html
     
    /usr/local/htdig/conf
    total 12 
    -rw-r--r-- 1 root root 8580 Oct 15 18:42 htdig.conf
     
    /usr/local/htdig/db
    total 236 
    -rw-r--r-- 1 root root 63488 Oct 15 18:44 db.docdb 
    -rw-r--r-- 1 root root 11991 Oct 15 18:42 db.docs 
    -rw-r--r-- 1 root root 5120 Oct 15 18:44 db.docs.index 
    -rw-r--r-- 1 root root 54004 Oct 15 18:44 db.wordlist 
    -rw-r--r-- 1 root root 82944 Oct 15 18:44 db.words.db 


    The Search Binary

    The search binary should have been installed to "/usr/local/apache/cgi-bin/htsearch",

     
    ls -/usr/local/apache/cgi-bin 
    total 560 
    -rwxr-xr-x 1 root root 558796 Oct 15 18:32 htsearch
    -rw-r--r-- 1 root root 268 Aug 18 16:37 printenv 
    -rw-r--r-- 1 root root 757 Aug 18 16:37 test-cgi 

    with a sample search form and images to "/usr/local/apache/htdocs/htdig/".

    For an explanation of what each binary does, visit the ht://Dig documentation here.

    Once you've got ht://Dig installed, the next step is to configure it and start indexing your site. Let's look at that next.

    Building An Index

    ht://Dig is configured via a single configuration file, named "htdig.conf" and located in the installation's "conf" directory. Most of the time, this configuration file is set up automatically based on the arguments you passed to the "configure" script, and only needs to be altered to reflect the URL at which indexing should begin.

    Pop open this file in your favourite text editor, and look for the "start_url" variable:

     

    # This specifies the URL where the robot (htdig) will start. You can specify 
    # multiple URLs here. Just separate them by some whitespace. 
    # The example here will cause the ht://Dig homepage and related pages to be 
    # indexed. 
    # You could also index all the URLs in a file like so: 
    # start_url: `${common_dir}/start.url` 

    start_url: http://localhost/ 

    Alter this variable to reflect the URL at which indexing should begin, and save the changes back to the file.



     
     
    >>> More Administration Articles          >>> More By icarus, (c) Melonfire
     

       

    ADMINISTRATION ARTICLES

    - Network Booting via PXE: the Basics
    - Scalix: Linux Administrator`s Guide
    - Network Administration with FreeBSD 7
    - Components of an Information Architecture
    - The Anatomy of an Information Architecture
    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    Stay green...Green IT