Home arrow Perl Programming arrow Page 2 - Web Mining with Perl

Accessing The Net (LWP) - Perl

It is common knowledge that the Internet is a great data source. It is alsocommon knowledge that it is difficult to get the information you want in the format you need. No longer.

TABLE OF CONTENTS:
  1. Web Mining with Perl
  2. Accessing The Net (LWP)
  3. Cut Along The Table Lines (HTML::TableExtract)
  4. Learning From Links (HTML::LinkExtor)
  5. Checking For Sameness (String::CRC)
  6. Bringing It All Together
  7. Conclusion
By: Tommie Jones
Rating: starstarstarstarstar / 54
March 05, 2002

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
LWP, which stands for the libwww-Perl library, is a common module that may have comes with most installations of Perl. LWP (as quoted from the LWP perldoc) is a collection of Perl modules that provide a consistent and simple application-programming interface to the World Wide Web. LWP provides support for redirection, cookies, basic authentication and robot.txt parsing. For the majority of web-crawling requirements a developer can use LWP::Simple. LWP-Simple allows the developer to store the head or body of a web page (given its URL) in a scalar variable or file. Here is an example.


#!/usr/bin/perl use LWP::Simple; #Store the output of the web page (html and all) in content my $content = get("http://www.yahoo.com"); if (defined $content) { #$content will contain the html associated with the url mentioned above. print $content; } else { #If an error occurs then $content will not be defined. print "Error: Get failed"; }
After loading the LWP::Simple module with the use command the get subroutine is called to download the html on the http://www.yahoo.com web site. The html is stored in the $content variable. If there is not an error the $content value is printed to standard output.

Other modules exist in the LWP::Bundle that handle cookies, automatic redirection and other things. For more information please read the perldoc on LWP::RobotUA and LWP::UserAgent.

 
 
>>> More Perl Programming Articles          >>> More By Tommie Jones
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: