Perl Programming Page 8 - Introduction to mod_perl (part 4): Perl Basics |
And finally I want to cover the pitfall many people has falleninto. Let's talk about regular expressions use under mod_perl. When using a regular expression that contains an interpolated Perlvariable, if it is known that the variable (or variables) will notchange during the execution of the program, a standard optimizationtechnique is to add the
my $pat = '^foo$'; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
This is usually a big win in loops over lists, or when using the In long-lived mod_perl scripts, however, the variable may change witheach invocation and this can pose a problem. The first invocation of afresh httpd child will compile the regex and perform the searchcorrectly. However, all subsequent uses by that child will continue tomatch the original pattern, regardless of the current contents of thePerl variables the pattern is supposed to depend on. Your script willappear to be broken. There are two solutions to this problem: The first is to use The above code fragment would be rewritten as:
my $pat = '^foo$';
eval q{
foreach( @list ) {
print if /$pat/o;
}
}
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
means that I recompile the regex for every element in the list eventhough the regex doesn't change. You can use this approach if you require more than one pattern matchoperator in a given section of code. If the section contains only oneoperator (be it an The above code fragment becomes:
my $pat = '^foo$';
"something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
foreach( @list ) {
print if //;
}
The only gotcha is that the dummy match that boots the regularexpression engine must absolutely, positively succeed, otherwise thepattern will not be cached, and the If you can guarantee that the pattern variable contains nometa-characters (things like *, +, ^, $...), you can use the dummymatch: $pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present If there is a possibility that the pattern can containmeta-characters, you should search for the pattern or thenon-searchable \377 character as follows: "\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present Another approach: It depends on the complexity of the regex to which you apply thistechnique. One common usage where a compiled regex is usually moreefficient is to ``match any one of a group of patterns'' over andover again. Maybe with a helper routine, it's easier to remember. Here is oneslightly modified from Jeffery Friedl's example in his book``Mastering Regular Expressions''.
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage: @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww); $Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
In the next article I'll present a few other Perl basics directlyrelated to the mod_perl programming.
blog comments powered by Disqus |
|
|
|
|
|
|
|