Perl
  Home arrow Perl arrow Page 2 - Beginning Perl
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
IBM Developerworks
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PERL

Beginning Perl
By: Dev Shed
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 12
    2000-07-14

    Table of Contents:
  • Beginning Perl
  • Regular Expressions
  • Escaping Special Characters
  • Repetition
  • Working with RegExps
  • More Advanced Topics

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Route your faxes to your email inbox. Private, secure fax numbers available from CallWave. Choose your fax number.

    Beginning Perl - Regular Expressions
    (Page 2 of 6 )


    "11:15. Restate my assumptions:

    Mathematics is the language of nature.
    Everything around us can be represented and understood through numbers.
    If you graph these numbers, patterns emerge. Therefore: There are patterns everywhere in nature."

    - Max Cohen in Pi, 1998

    Whether or not you agree that Max's assumptions give rise to his conclusion is your own opinion, but his case is much easier to follow in the field of computers - there are certainly patterns everywhere in programming.

    Regular expressions allow us look for patterns in our data. So far we've been limited to checking a single value against that of a scalar variable or the contents of an array or hash. By using the rules outlined in this chapter, we can use that one single value (or pattern) to describe what we're looking for in more general terms: we can check that every sentence in a file begins with a capital letter and ends with a full stop, find out how many times James Bond's name is mentioned in 'Goldfinger', or learn if there are any repeated sequences of numbers in the decimal representation of p greater than five in length.

    However, regular expressions are a very big area - they're one of the most powerful features of Perl. We're going to break our treatment of them up into six sections:

    Basic patterns
    Special characters to use
    Quantifiers, anchors and memorizing patterns
    Matching, substituting, and transforming text using patterns
    Backtracking
    A quick look at some simple pitfalls

    Generally speaking, if you want to ask perl something about a piece of text, regular expressions are going to be your first port of call - however, there's probably one simple question burning in your headÖ

    What Are They?
    The term "Regular Expression" (now commonly abbreviated to "RegExp" or even "RE") simply refers to a pattern that follows the rules of syntax outlined in the rest of this chapter. Regular expressions are not limited to perl - Unix utilities such as sed and egrep use the same notation for finding patterns in text. So why aren't they just called 'search patterns' or something less obscure?

    Well, the actual phrase itself originates from the mid-fifties when a mathematician called Stephen Kleene developed a notation for manipulating 'regular sets'. Perl's regular expressions have grown and grown beyond the original notation and have significantly extended the original system, but some of Kleene's notation remains, and the name has stuck.

    Patterns
    History lessons aside, it's all about identifying patterns in text. So what constitutes a pattern? And how do you compare it against something?

    The simplest pattern is a word - a simple sequence of characters - and we may, for example, want to ask perl whether a certain string contains that word. Now, we can do this with the techniques we have already seen: We want to split the string into separate words, and then test to see if each word is the one we're looking for. Here's how we might do that:

    #!/usr/bin/perl
    # match1.plx
    use warnings;
    use strict;
    my $found = 0;
    $_ = "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";
    my $sought = "people";
    foreach my $word (split) {
    if ($word eq $sought) {
    $found = 1;
    last;
    }
    }
    if ($found) {
    print "Hooray! Found the word 'people'\n";
    }

    Sure enough the program returns success:

    >perl match1.plx

    Hooray! Found the word 'people'
    >

    But that's messy! It's complicated, and it's slow to boot! Worse still, the split function (which breaks each of our lines up into a list of 'words' - we'll see more of this, later on in the chapter) actually keeps all the punctuation - the string 'you ' wouldn't be found in the above, whereas 'you... ' would. This looks like a hard problem, but it should be easy. Perl was designed to make easy tasks easy and hard things possible, so there should be a better way to do this. This is how it looks using a regular expression:

    #!/usr/bin/perl# match1.plxuse warnings;use strict;
    $_ = "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";

    if ($_ =~ /people/) {
    print "Hooray! Found the word 'people'\n";
    }

    This is much, much easier and yeilds the same result. We place the text we want to find between forward slashes - that's the regular expression part - that's our pattern, what we're trying to match. We also need to tell perl which particular string we're looking for in that pattern. We do this with the =~ operator. This returns 1 if the pattern match was successful (in our case, whether the character sequence 'people' was found in the string) and the undefined value if it wasn't.

    Before we go on to more complicated patterns, let's just have a quick look at that syntax. As we noted previously, a lot of Perl's operations take $_ as a default argument, and regular expressions are one such operation. Since we have the text we want to test in $_ , we don't need to use the =~ operator to 'bind' the pattern to another string. We could write the above even more simply:

    $_ = "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";

    if (/people/) {
    print "Hooray! Found the word 'people'\n";}

    Alternatively, we might want to test for the pattern not matching - the word not being found. Obviously, we could say unless (/people/) , but if the text we're looking at isn't in $_ , we may also use the negative form of that =~ operator, which is !~ . For example:

    #!/usr/bin/perl
    # nomatch.plx
    use warnings;
    use strict;
    my $gibson =
    "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";
    if ($gibson !~ /fish/) {
    print "There are no fish in William Gibson.\n";
    }

    True to form, for cyberpunk books that don't regularly involve fish, we get the result.

    >perl nomatch.plx

    There are no fish in William Gibson.
    >

    Literal text is the simplest regular expression of all to look for, but we needn't look for just the one word - we could look for any particular phrase. However, we need to make sure that we exactly match all the characters: words (with correct capitalization), numbers, punctuation, and even whitespace:

    #!/usr/bin/perl
    # match2.plx
    use warnings;
    use strict;
    $_ = "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";
    if (/I do/) {
    print "'I do' is in that string.\n";
    }
    if (/sometimes Case/) {
    print "'sometimes Case' matched.\n";
    }

    Let's run this program and see what happens:

    >perl match2.plx
    'I do' is in that string.
    >

    The other string didn't match, even though those two words are there. This is because everything in a regular expression has to match the string, from start to finish: first "sometimes", then a space, then "Case". In $_ , there was a comma before the space, so it didn't match exactly. Similarly, spaces inside the pattern are significant:

    #!/usr/bin/perl
    # match3.plx
    use warnings;
    use strict;
    my $test1 = "The dog is in the kennel";
    my $test2 = "The sheepdog is in the field";
    if ($test1 =~ / dog/) {
    print "This dog's at home.\n";
    }
    if ($test2 =~ / dog/) {
    print "This dog's at work.\n";
    }

    This will only find the first dog, as perl was looking for a space followed by the three letters, 'dog':

    >perl match3.plx
    This dog's at home.
    >

    So, for the moment, it looks like we shall have to specify our patterns with absolute precision. As another example, look at this:

    #!/usr/bin/perl
    # match4.plx
    use warnings;
    use strict;

    $_ = "Nobody wants to hurt you... 'cept, I do hurt people sometimes, Case.";
    if (/case/) {
    print "I guess it's just the way I'm made.\n";
    } else {
    print "Case? Where are you, Case?\n";
    }

    > perl match4.plx
    Case? Where are you, Case?
    >


    Hmm, no match. Why not? Because we asked for a small 'c' when we had a big 'C' - regexps are (if you'll pardon the pun) case-sensitive. We can get around this by asking perl to compare insensitively, and we do this by putting an 'i' (for 'insensitive') after the closing slash. If we alter the code above as follows:

    if (/case/i) {
    print "I guess it's just the way I'm made.\n";} else { print "Case? Where are you, Case?\n";}

    then we find him:

    >perl match4.plx
    I guess it's just the way I'm made.
    >

    This 'i ' is one of several modifiers that we can add to the end of the regular expression to change its behavior slightly. We'll see more of them later on.

    Interpolation
    Regular expressions work a little like double-quoted strings; variables and metacharacters are interpolated. This allows us to store patterns in variables and determine what we are matching when we run the program - we don't need to have them hard-coded in:

    Try it out - Pattern Tester
    This program will ask the user for a pattern and then test to see if it matches our string. We can use this throughout the chapter to help us test the various different styles of pattern we'll be looking at:

    #!/usr/bin/perl
    # matchtest.plx
    use warnings;
    use strict;
    $_ = q("I wonder what the Entish is for 'yes' and 'no'," he thought.);
    # Tolkien, Lord of the Rings
    print "Enter some text to find: ";
    my $pattern = <STDIN>;
    chomp($pattern);

    if (/$pattern/) {
    print "The text matches the pattern '$pattern'.\n";
    } else {
    print "'$pattern' was not found.\n";
    }

    Now we can test out a few things:

    > perl matchtest.plx
    Enter some text to find: wonder
    The text matches the pattern 'wonder'.

    > perl matchtest.plx
    Enter some text to find: entish
    'entish' was not found.

    > perl matchtest.plx
    Enter some text to find: hough
    The text matches the pattern 'hough'.

    > perl matchtest.plx
    Enter some text to find: and 'no',
    The text matches the pattern 'and 'no''.

    Pretty straightforward, and I'm sure you could all spot those not in $_ as well.

    How It Works
    matchtest.plx has its basis in the three lines:

    my $pattern = <STDIN>;chomp($pattern);
    if (/$pattern/) {

    We're taking a line of text from the user. Then, since it will end in a new line, and we don't necessarily want to find a new line in our pattern, we chomp it away. Now we do our test.

    Since we're not using the =~ operator, the test will be looking at the variable $_ . The regular expression is /$pattern/ , and just like the double-quoted string "$pattern" , the variable $pattern is interpolated. Hence, the regular expression is purely and simply whatever the user typed in, once we've got rid of the new line.

    ©1999 Wrox Press Limited, US and UK.

    More Perl Articles
    More By Dev Shed


     

       

    PERL ARTICLES

    - Perl: A Continuing Look at Hashes and Multid...
    - Perl: Another Round with Hashes
    - Perl Hashes
    - Perl Lists: A Final Look at List::Util
    - Perl Lists: Utilizing List::Util
    - Perl Lists: The Split() Function
    - SQL and CGI with Perl and DBI
    - Perl Lists: More Functions and Operators
    - SELECT Queries and Perl
    - Perl Lists: More on Manipulation
    - Creating a Database with Perl and DBI
    - Perl: Sailing the List(less) Seas
    - Perl and DBI
    - Perl: Concatenating Text and More
    - Perl Text: Quoting Without Quote Marks

     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway