In this second part of a four-part series on parsing and regular expression basics in Perl, you'll learn about quantifiers, modifiers, and more. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).
Quantifiers are not the only things that allow you to save some time and typing. The Perl regular expression engine is also able to recognize a variety of predefined subpatterns that you can use to recognize simple but common patterns. For example, suppose you simply want to match any alphanumeric character. You can write an expression containing the pattern [a-zA-Z0-9], or you can simply use the predefined pattern specified by \w. Table 1-2 lists other such useful subpatterns.
Table 1-2. Useful Subpatterns
Specifier
Pattern
\w
Any standard alphanumeric character or an underscore (_)
\W
Any nonalphanumeric character or an underscore (_)
\d
Any digit
\D
Any nondigit
\s
Any of\n,\r,\t,\f, and" "
\S
Any other than\n,\r,\t,\f, and" "
.
Any other than\n
These specifiers are quite common in regular expressions, especially when combined with the quantifiers listed in Table 1-1. For example, you can use\w+to match any word, used+to match any series of digits, or use\s+to match any type of whitespace. For example, if you want to split the contents of a tab-delimited text file (such as in Figure 1-1) into an array, you can easily perform this task using thesplit function as well as a regular expression involving\s+. The code for this would be as follows:
while (<>){ push @Array, {split /\s+/ }; }
The regular expression argument provided for thesplit function tells the function where to split the input data and what elements to leave out of the resultant array. In this case, every time whitespace occurs, it signifies that the next nonwhitespace region should be a distinct element in the resultant array.