In this third part to a four-part series on parsing and regular expressions in Perl, you will learn about cloistered pattern modifiers, boundary assertions, troubleshooting regular expressions, and more. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).
In the previous section, you saw how to apply pattern modifiers to an entire regular expression. It is also possible to apply these modifiers to just a portion of a given regular expression; however, the syntax is somewhat different. The first step is to define the subpattern to which you want the modifier to apply. You accomplish this by placing the subpattern within a set of parentheses. Immediately after the open parenthesis, but before the subpattern, you add ?modifiers:. For example, if you want to match eitherABCorAbC, rather than using alternation, you write the following:
/A(?i:B)C/
To create a regular expression that allows.to match/nbut only in part of the expression, you can code something like the following, which allows any character to be matched until anAis encountered:
/.*?A(?s:.*?)BC/
It then allows any character to match, including/n, until aBC is encountered.
Note Cloistered pattern modifiers are available only in Perl versions 5.60 and later.
Assertions
Assertions are somewhat different from the topics I covered in the preceding sections on regular expressions, because unlike the other topics, assertions do not deal with characters in a string. Because of this, they are more properly referred to as zero-width assertions.
Assertions instead allow you to add positional considerations to your string matching capabilities. Table 1-5 summarizes the available assertions.
Table 1-5. Assertions
Assertion
Function
\A,^
Beginning assertions
\Z,\z,$
Ending assertions
\b
Boundary assertion
\G
Previous match assertion
The \A and ^ Assertions
For example, if you want to match only the beginning of a string, you can employ the\Aassertion. Similarly, you can also use the^assertion, known as the beginning-of-line assertion, which will match characters at the beginning of a string. When used in conjunction with the/mmodifier, it will also be able to match characters after any new lines embedded within a string. Thus, if you had the regular expressions/\A123/and/^123/m, both would be able to match the string123456, but only/^123/mwould be able to match the stringabd\n123.
The \z, \Z, and $ Assertions
Just as there are assertions for dealing with the beginnings of lines and strings, so too are there assertions for dealing with the character sequences that end strings. The first of these assertions is the\zassertion, which will match the ending contents of a string, including any new lines.\Zworks in a similar fashion; however, this assertion will not include a terminal new line character in its match, if one is present at the end of a string. The final assertion is$, which has functionality similar to\Z, except that the/mmodifier can enable this assertion to match anywhere in a string that is directly prior to a new line character. For example,/\Z321/,/\z321/, and/$321/would be able to match the string654321.