Perl Programming Page 4 - Modifiers, Boundaries, and Regular Expressions |
The previous examples clearly demonstrate that regular expressions are a powerful and flexible programming tool and are thus widely applicable to a wealth of programming tasks. As you can imagine, however, all this power and flexibility can often make constructing complex regular expressions quite difficult, especially when certain positions within the expression are allowed to match multiple characters and/or character combinations. The construction of robust regular expressions is something that takes practice; but while you are gaining that experience, you should keep in mind a few common types of mistakes:
Even with these guidelines, debugging a complex regular expression can still be a challenge, and one of the best, although time-consuming, ways to do this can be to actually draw a visual representation of how the regular expression should work, similar to that found in the state machine figures presented earlier in the chapter (Figure 1-2 through Figure 1-8). If drawing this type of schematic seems too arduous a task, you may want to consider using theGraphViz::Regexmodule. GraphViz::Regex GraphViz is a graphing program developed by AT&T for the purpose of creating visual representations of structured information such as computer code (http://www.research.att.com/sw/tools/graphviz/). Leon Brocard wrote the GraphViz Perl module, which serves as a Perl-based interface to the GraphViz program. GraphViz::Regex can be useful when coding complex regular expressions, since this module is able to create visual representations of regular expressions via GraphViz. The syntax for using this module is quite straightforward and is demonstrated in the following code snippet: Use GraphViz::Regex; my $regex='((123|ab(c|C))'; When you first employ theGraphViz::Regexmodule, you place a call to the new constructor, which requires a string of the regular expression that you seek a graphical representation of. The new method is then able to create a GraphViz object that corresponds to this representation and assigns the object to$graph. Lastly, you are able to print the graphical representation you created. This example displays a JPEG file, but numerous other file types are supported, including GIF, PostScript, PNG, and bitmap. Caution The author of the module reports that there are incompatibilities between this module and Perl versions 5.005_03 and 5.7.1. Tip Another great tool for debugging regular expressions comes as a component of ActiveState’s programming IDE Komodo. Komodo contains the Rx Toolkit, which allows you to enter a regular expression and a string into each of its fields and which tells you if they do or do not match as you type. This can be a rapid way to determine how well a given expression will match a given string. Using Regexp::Common As you can imagine, certain patterns are fairly commonplace and will likely be repeatedly utilized. This is the basis behind Regexp::Common, which is a Perl module originally authored by Damian Conway and maintained by Abigail that provides a means of accessing a variety of regular expression patterns. Since writing regular expressions can often be tricky, you may want to check this module and see if a pattern suited to your needs is available. Table 1-7 lists all the expression pattern categories available in version 2.113 of this module.
Although Table 1-7 provides a general idea of the different types of patterns, it is a good idea to look at the module description available at CPAN (http://www.cpan.org/). The module operates by generating hash values that correspond to different patterns, and these patterns are stored in the hash Regexp::Common::Balanced This namespace generates regular expressions that are able to match sequences located between balanced parentheses or brackets. The basic syntax needed to access these regular expressions is as follows: $RE{balanced}{-parens=>'()[]{}'} The first part of this hash value refers to the basic regular expression structure needed to match text between balanced delimiters. The second part is a flag that specifies the types of parentheses you want the regular expression to recognize. In this case, it is set to work with(),[], and{}. One application of such a regular expression is in the preparation of publications that contain citations, such as “(Smith et al., 1999).” An author may want to search a document for in-text citations in order to ensure they did not miss adding any to their list of references. You can easily accomplish this by passing the filename of the document to the segment of code shown in Listing 1-7. Listing 1-7. Pulling Out the Contents of ()from a Document #!/usr/bin/perl -w while(<>){ Note A more detailed description of the module’s usage will follow in the sections “Standard Usage” and “Subroutine-Based Usage,” since each of the expression types can be accessed through code in the same manner.
Please check back next week for the conclusion to this article.
blog comments powered by Disqus |
|
|
|
|
|
|
|