Web Mining with Perl (
Page 1 of 7 )
It is common knowledge that the Internet is a great data source. It is also
common knowledge that it is difficult to get the information you want in the format you need. No longer.Any organization that spends money for marketing research or generating sales
leads can benefit from building a web crawler. Instead of spending tens of
thousands of dollars for a boxed market research survey, a web crawler can be
used to ferret information from the web.
For example: 1. Retail oriented
companies can build web crawlers to find trends mentioned in web logs. 2.
Software consulting companies could crawl industry specific news groups and
mailing lists for potential customers asking for advice. 3. Job placement
services could search company sites for an increase in Job postings.
All
of these tasks can be accomplished with creative use of Perl and it's abundance
of CPAN (Comprehensive Perl Archive - the repository of Perl module/libraries)
modules. In this article the main topic will include some of the CPAN modules
available and how they can be used to accomplish tasks similar to the ones
above.
Why Perl? Why not? Perl is an excellent tool for a web mining
project. Perl's basic but powerful built-in data structures, easily accessible
regular expressions and large selection of CPAN modules show that Perl easily
meets the application's requirements.
The rest of this article will
discuss some CPAN Modules that will be useful when building a Perl-based web
crawler.