In this second part of a four-part series on parsing and regular expression basics in Perl, you'll learn about quantifiers, modifiers, and more. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).
In the previous section, you saw the classic predefined Perl patterns, but more recent versions of Perl also support some predefined subpattern types through a set of Posix character classes. Table 1-3 summarizes these classes, and I outline their usage after the table.
Table 1-3. Posix Character Classes
Posix Class
Pattern
[:alnum:]
Any letter or digit
[:alpha:]
Any letter
[:ascii:]
Any character with a numeric encoding from 0 to 127
[:cntrl:]
Any character with a numeric encoding less than 32
[:digit:]
Any digit from 0 to 9 (\d)
[:graph:]
Any letter, digit, or punctuation character
Table 1-3. Posix Character Classes (continued)
Posix Class
Pattern
[:lower:]
Any lowercase letter
[:print:]
Any letter, digit, punctuation, or space character
[:punct:]
Any punctuation character
[:space:]
Any space character (\s)
[:upper:]
Any uppercase letter
[:word:]
Underline or any letter or digit
[:xdigit:]
Any hexadecimal digit (that is, 0–9, a–f, or A–F)
Note You can use Posix characters in conjunction with Unicode text. When doing this, however, keep in mind that using a class such as[:alpha:]may return more results than you expect, since under Unicode there are many more letters than under ASCII. This likewise holds true for other classes that match letter and digits.
The usage of Posix character classes is actually similar to the previous examples where a range of characters was defined, such as[A-F], in that the characters must be enclosed in brackets. This is actually sometimes a point of confusion for individuals who are new to Posix character classes, because, as you saw in Table 1-3, all the classes already have brackets. This set of brackets is actually part of the class name, not part of the Perl regex. Thus, you actually need a second set, such as in the following regular expression, which will match any number of digits: