Home arrow Perl Programming arrow Page 2 - Modifiers, Boundaries, and Regular Expressions

Boundary Assertions - Perl

In this third part to a four-part series on parsing and regular expressions in Perl, you will learn about cloistered pattern modifiers, boundary assertions, troubleshooting regular expressions, and more. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).

TABLE OF CONTENTS:
  1. Modifiers, Boundaries, and Regular Expressions
  2. Boundary Assertions
  3. Capturing Substrings
  4. Troubleshooting Regexes
By: Apress Publishing
Rating: starstarstarstarstar / 1
June 03, 2010

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

While assertions dealing with the beginning and end of a string/line are certainly useful, assertions that allow you to deal with positions internal to a string/line are just as important. Several types of assertions can accomplish this, and the first type you will examine is the so-called boundary assertion. The\bboundary assertion allows you to perform matches at any word boundary. A word boundary can exist in two possible forms, since you have both a beginning of a word and an end. In more technical terms, the beginning of a word boundary is defined as\W\w, or any nonalphanumeric character followed by any alphanumeric character. An end of a word boundary has the reverse definition. That is, it is defined by\w\W, or a word character followed by a nonword character. When using these assertions, you should keep in mind several considerations, however. The first is that the underscore character is a part of the\wsubpattern, even though it is not an alphanumeric character. Furthermore, you need to be careful using this assertion if you are dealing with contractions, abbreviations, or other wordlike structures, such as Web and e-mail addresses, that have embedded nonalphanumeric characters. According to the\w\Wor\W\wpattern, any of the following would contain valid boundaries:

"can't"
www.apress.com
"F.B.I."
"((1+2)*(3-4))"
user@example.com
"(555) 555-5555"

The pos Function and \G Assertion

Before I discuss the remaining assertion, I will first discuss theposfunction, since this function and the\Gassertion are often used to similar effect. You can use theposfunction to either return or specify the position in a string where the next matching operation will start (that is, one after the current match). To better understand this, consider the code in Listing 1-4.

Listing 1-4. Using the posFunction

#!usr/bin/perl;
$string="regular expressions are fun";
pos($string)=3;
while($string=~/e/g){
    print "e at position " . (pos($string)-1). "\n";
}

If you execute this script, you get the following output:

e at position 8
e at position 12
e at position 22

Notice how the firsteis missing from the output. This is because Listing 1-4 specified the search to begin at position 3, which is after the occurrence of the firste. Hence, when you print the listing of the returned matches, you can see that thee in the first position was not seen by the regular expression engine.

The remaining assertion, the\Gassertion, is a little more dynamic than the previous assertions in that it does not specify a fixed type of point where matching attempts are allowed to occur. Rather, the\Gassertion, when used in conjunction with the/g modifier, will allow you to specify the position right in front of your previous match. Letís examine how this works by looking at a file containing a list of names followed by phone numbers. Listing 1-5 shows a short script that will search through the list of names until it finds a match. The script will then print the located name and the corresponding phone number.

Listing 1-5. Using the \GAssertion

#!/usr/bin/perl

($string=<<'LIST');
John (555)555-5555
Bob (234)567-8901
Mary (734)234-9873
Tom (999)999-9999
LIST

$name="Mary";
pos($string)=0;
while($string=~/$name/g){
    if($string=~/\G(\s?\(?\d{3}\)?[-\s.]?\d{3}[-.]\d{4})/){
        print "$name $1";
    }
}


Note  As mentioned earlier, parentheses are metacharacters and must be escaped in order to allow the regular expression to match them.


This script begins with you creating the$stringvariable and adding the list of names. Next, you define the$namevariable as the name Mary. The next line of code is not always necessary but can be if prior matching and other types of string manipulation were previously performed on the string. You can use thepos function to set the starting point of the search to the starting point of the string. Finally, you can use a loop structure to search for the name Mary within your$stringvariable. Once Mary is located, you apply the\Gassertion in the conditional statement, which will recognize and print any phone number that is present immediately after Mary. If you execute this script, you should receive the following output:

Mary (734)234-9873



 
 
>>> More Perl Programming Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: