Perl
  Home arrow Perl arrow Page 2 - Web Mining with Perl
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PERL

Web Mining with Perl
By: Tommie Jones
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 47
    2002-03-05

    Table of Contents:
  • Web Mining with Perl
  • Accessing The Net (LWP)
  • Cut Along The Table Lines (HTML::TableExtract)
  • Learning From Links (HTML::LinkExtor)
  • Checking For Sameness (String::CRC)
  • Bringing It All Together
  • Conclusion

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    PCmover - $15 Off with Coupon Code CJPH7Q

    Web Mining with Perl - Accessing The Net (LWP)
    (Page 2 of 7 )

    LWP, which stands for the libwww-Perl library, is a common module that may have comes with most installations of Perl. LWP (as quoted from the LWP perldoc) is a collection of Perl modules that provide a consistent and simple application-programming interface to the World Wide Web. LWP provides support for redirection, cookies, basic authentication and robot.txt parsing. For the majority of web-crawling requirements a developer can use LWP::Simple. LWP-Simple allows the developer to store the head or body of a web page (given its URL) in a scalar variable or file. Here is an example.


    #!/usr/bin/perl use LWP::Simple; #Store the output of the web page (html and all) in content my $content = get("http://www.yahoo.com"); if (defined $content) { #$content will contain the html associated with the url mentioned above. print $content; } else { #If an error occurs then $content will not be defined. print "Error: Get failed"; }
    After loading the LWP::Simple module with the use command the get subroutine is called to download the html on the http://www.yahoo.com web site. The html is stored in the $content variable. If there is not an error the $content value is printed to standard output.

    Other modules exist in the LWP::Bundle that handle cookies, automatic redirection and other things. For more information please read the perldoc on LWP::RobotUA and LWP::UserAgent.

    More Perl Articles
    More By Tommie Jones


     

       

    PERL ARTICLES

    - Perl: A Continuing Look at Hashes and Multid...
    - Perl: Another Round with Hashes
    - Perl Hashes
    - Perl Lists: A Final Look at List::Util
    - Perl Lists: Utilizing List::Util
    - Perl Lists: The Split() Function
    - SQL and CGI with Perl and DBI
    - Perl Lists: More Functions and Operators
    - SELECT Queries and Perl
    - Perl Lists: More on Manipulation
    - Creating a Database with Perl and DBI
    - Perl: Sailing the List(less) Seas
    - Perl and DBI
    - Perl: Concatenating Text and More
    - Perl Text: Quoting Without Quote Marks




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway