Home arrow Perl Programming arrow Page 4 - Quantifiers and Other Regular Expression Basics

Modifiers - Perl

In this second part of a four-part series on parsing and regular expression basics in Perl, you'll learn about quantifiers, modifiers, and more. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).

TABLE OF CONTENTS:
  1. Quantifiers and Other Regular Expression Basics
  2. Predefined Subpatterns
  3. Posix Character Classes
  4. Modifiers
By: Apress Publishing
Rating: starstarstarstarstar / 2
May 27, 2010

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
 

As the name implies, modifiers allow you to alter the behavior of your pattern match in some form. Table 1-4 summarizes the available pattern modifiers.

Table 1-4. Pattern Matching Modifiers

Modifier Function
/i Makes insensitive to case
/m Allows$and^to match near/n(multiline)
/x

Allows insertion of comments and whitespace in expression

/o Evaluates the expression variable only once
/s Allows.to match/n(single line)
/g Allows global matching
/gc After failed global search, allows continued matching

For example, under normal conditions, regular expressions are case-sensitive. Therefore,ABCis a completely different string fromabc. However, with the aid of the pattern modifier/i, you could get the regular expression to behave in a case-insensitive manner. Hence, if you executed the following code, the action contained within the conditional would execute:

if("abc"=~/ABC/i){
   #do something
}

You can use a variety of other modifiers as well. For example, as you will see in the upcoming “Assertions” section, you can use the/mmodifier to alter the behavior of the^ and$ assertions by allowing them to match at line breaks that are internal to a string, rather than just at the beginning and ending of a string. Furthermore, as you saw earlier, the subpattern defined by.normally allows the matching of any character other than the new line metasymbol,\n. If you want to allow.to match\nas well, you simply need to add the/smodifier. In fact, when trying to match any multiline document, it is advisable to try the/smodifier first, since its usage will often result in simpler and faster executing code.

Another useful modifier that can become increasingly important when dealing with large loops or any situation where you repeatedly call the same regular expression is the/omodifier. Let’s consider the following piece of code:

While($string=~/$pattern/){
   #do something
}

If you executed a segment of code such as this, every time you were about to loop back through the indeterminate loop the regular expression engine would reevaluate the regular expression pattern. This is not necessarily a bad thing, because, as with any variable, the contents of the$patternscalar may have changed since the last iteration. However, it is also possible that you have a fixed condition. In other words, the contents of$patternwill not change throughout the course of the script’s execution. In this case, you are wasting processing time reevaluating the contents of$patternon every pass. You can avoid this slowdown by adding the/omodifier to the expression:

While($string=~/$pattern/o){
  
#do something
}

In this way, the variable will be evaluated only once; and after its evaluation, it will remain a fixed value to the regular expression engine.


Note  When using the/omodifier, make sure you never need to change the contents of the pattern variable. Any changes you make after/ohas been employed will not change the pattern used by the regular expression engine.


The/xmodifier can also be useful when you are creating long or complex regular expressions. This modifier allows you to insert whitespace and comments into your regular expression without the whitespace or#being interpreted as a part of the expression. The main benefit to this modifier is that it can be used to improve the readability of your code, since you could now write/\w+  |  \d+ /xinstead of/\w+|\d+ /.

The/gmodifier is also highly useful, since it allows for global matching to occur. That is, you can continue your search throughout the whole string and not just stop at the first match. I will illustrate this with a simple example from bioinformatics: DNA is made up of a series of four nucleotides specified by the letters A, T, C, and G. Scientists are often interested in determining the percentage of G and C nucleotides in a given DNA sequence, since this helps determine the thermostability of the DNA (see the following note).


Note  DNA consists of two complementary strands of the nucleotides A, T, C, and G. The A on one strand is always bonded to a T on the opposing strand, and the G on one strand is always bonded to the C on the opposing strand, and vice versa. One difference is that G and C are connected by three bonds, whereas A and T only two. Consequently, DNA with more GC pairs is bound more strongly and is able to withstand higher temperatures, thereby increasing its thermostability.


Thus, I will illustrate the/gmodifier by writing a short script that will determine the%GCcontent in a given sequence of DNA. Listing 1-3 shows the Perl script I will use to accomplish this.

Listing 1-3. Determining %GCContent

#!usr/bin/perl;

$String="ATGCCGGGAAATTATAGCG";
$Count=0;

while($String=~/G|C/g){
   $Count=$Count+1;
}
$len=length($String);
$GC=$Count/$len;
print "The DNA sequence has $GC %GC Content";

As you can see, you store your DNA sequence in the scalar variable$Stringand then use an indeterminate loop to step through the character content of the string. Every time you encounter aG or aC in your string, you increment your counter variable ($Count) by 1. After you have completed your iterations, you divide the number ofGs andCs by the total sequence length and print your answer. For the previous DNA sequence, the output should be as follows:

The DNA sequence has 0.473684210526316 %GC Content

Under normal conditions, when the/gmodifier fails to match any more instances of a pattern within a string, the starting position of the next search is reset back to zero. However, if you specified/gcinstead of just/g, your next search would not reset back to the beginning of the string, but rather begin from the position of the last match.

Please check back next week for the continuation of this article.



 
 
>>> More Perl Programming Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: