Home arrow Perl Programming arrow Page 3 - Introduction to mod_perl (part 5): More Perl Basics

When You Cannot Get Rid of The Inner Subroutine - Perl

In this article we continue to talk about the essential Perl basics,that you should know before starting to program for mod_perl.

TABLE OF CONTENTS:
  1. Introduction to mod_perl (part 5): More Perl Basics
  2. my() Scoped Variable in Nested Subroutines
  3. When You Cannot Get Rid of The Inner Subroutine
  4. perldoc's Rarely Known But Very Useful Options
  5. References
By: Stas Bekman
Rating: starstarstarstarstar / 5
March 12, 2003

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

First you might wonder, why in the world will someone need to definean inner subroutine? Well, for example to reduce some of Perl's scriptstartup overhead you might decide to write a daemon that will compilethe scripts and modules only once, and cache the pre-compiled code inmemory. When some script is to be executed, you just tell the daemonthe name of the script to run and it will do the rest and do it muchfaster since compilation has already taken place.

Seems like an easy task, and it is. The only problem is once thescript is compiled, how do you execute it? Or let's put it the otherway: after it was executed for the first time and it stays compiled inthe daemon's memory, how do you call it again? If you could get alldevelopers to code their scripts so each has a subroutine called run()that will actually execute the code in the script then we've solvedhalf the problem.

But how does the daemon know to refer to some specific script if theyall run in the main:: name space? One solution might be to ask thedevelopers to declare a package in each and every script, and for thepackage name to be derived from the script name. However, since thereis a chance that there will be more than one script with the same namebut residing in different directories, then in order to preventnamespace collisions the directory has to be a part of the packagename too. And don't forget that the script may be moved from onedirectory to another, so you will have to make sure that the packagename is corrected every time the script gets moved.

But why enforce these strange rules on developers, when we can arrangefor our daemon to do this work? For every script that the daemon isabout to execute for the first time, the script should be wrappedinside the package whose name is constructed from the mangled path tothe script and a subroutine called run(). For example if the daemon isabout to execute the script /tmp/hello.pl:

hello.pl
--------
#!/usr/bin/perl
print "Hello\n";

Prior to running it, the daemon will change the code to be:

wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl 
print "Hello\n";
}

The package name is constructed from the prefix cache::, eachdirectory separation slash is replaced with ::, and nonalphanumeric characters are encoded so that for example . (a dot)becomes _2e (an underscore followed by the ASCII code for a dot inhex representation).

% perl -e 'printf "%x",ord(".")'

prints: 2e. The underscore is the same you see in URL encodingexcept the % character is used instead (%2E), but since % hasa special meaning in Perl (prefix of hash variable) it couldn't beused.

Now when the daemon is requested to execute the script/tmp/hello.pl, all it has to do is to build the package name asbefore based on the location of the script and call its run()subroutine:

use cache::tmp::hello_2epl;
cache::tmp::hello_2epl::run();

We have just written a partial prototype of the daemon we wanted. Theonly outstanding problem is how to pass the path to the script to thedaemon. This detail is left as an exercise for the reader.

If you are familiar with the Apache::Registry module, you know thatit works in almost the same way. It uses a different package prefixand the generic function is called handler() and not run(). Thescripts to run are passed through the HTTP protocol's headers.

Now you understand that there are cases where your normal subroutinescan become inner, since if your script was a simple:

simple.pl
---------
#!/usr/bin/perl 
sub hello { print "Hello" }
hello();

Wrapped into a run() subroutine it becomes:

simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl 
sub hello { print "Hello" }
hello();
}

Therefore, hello() is an inner subroutine and if you have used my()scoped variables defined and altered outside and used inside hello(),it won't work as you expect starting from the second call, as wasexplained in the previous section.

Remedies for Inner Subroutines

First of all there is nothing to worry about, as long as you don'tforget to turn the warnings On. If you do happen to have the ``my()Scoped Variable in Nested Subroutines'' problem, Perl will alwaysalert you.

Given that you have a script that has this problem, what are the waysto solve it? There are many of them and we will discuss some of themhere.

We will use the following code to show the different solutions.

multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run{
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run

This code executes the run() subroutine three times, which in turninitializes the $counter variable to 0, every time it is executedand then calls the inner subroutine increment_counter() twice. Subincrement_counter() prints $counter's value after incrementingit. One might expect to see the following output:

run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !

But as we have already learned from the previous sections, this is notwhat we are going to see. Indeed, when we run the script we see:

% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18.
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 3 !
Counter is equal to 4 !
run: [time 3]
Counter is equal to 5 !
Counter is equal to 6 !

Obviously, the $counter variable is not reinitialized on eachexecution of run(). It retains its value from the previous execution,and sub increment_counter() increments that.

One of the workarounds is to use globally declared variables, with thevars pragma.

multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run

If you run this and the other solutions offered below, the expectedoutput will be generated:

% ./multirun1.pl
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !

By the way, the warning we saw before has gone, and so has theproblem, since there is no my() (lexically defined) variable usedin the nested subroutine.

Another approach is to use fully qualified variables. This is better,since less memory will be used, but it adds a typing overhead:

multirun2.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$main::counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !\n";
}
} # end of sub run

You can also pass the variable to the subroutine by value and make thesubroutine return it after it was updated. This adds time and memoryoverheads, so it may not be good idea if the variable can be verylarge, or if speed of execution is an issue.

Don't rely on the fact that the variable is small during thedevelopment of the application, it can grow quite big in situationsyou don't expect. For example, a very simple HTML form text entryfield can return a few megabytes of data if one of your users is boredand wants to test how good your code is. It's not uncommon to seeusers copy-and-paste 10Mb core dump files into a form's text fieldsand then submit it for your script to process.

multirun3.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
$counter = increment_counter($counter);
$counter = increment_counter($counter);
sub increment_counter{
my $counter = shift;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
} # end of sub run

Finally, you can use references to do the job. The version ofincrement_counter() below accepts a reference to the $countervariable and increments its value after first dereferencing it. Whenyou use a reference, the variable you use inside the function isphysically the same bit of memory as the one outside the function.This technique is often used to enable a called function to modifyvariables in a calling function.

multirun4.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter(\$counter);
increment_counter(\$counter);
sub increment_counter{
my $r_counter = shift;
$$r_counter++;
print "Counter is equal to $$r_counter !\n";
}
} # end of sub run

Here is yet another and more obscure reference usage. We modify thevalue of $counter inside the subroutine by using the fact thatvariables in @_ are aliases for the actual scalar parameters. Thusif you called a function with two arguments, those would be stored in$_[0] and $_[1]. In particular, if an element $_[0] isupdated, the corresponding argument is updated (or an error occurs ifit is not updatable as would be the case of calling the function witha literal, e.g. increment_counter(5)).

multirun5.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter($counter);
increment_counter($counter);
sub increment_counter{
$_[0]++;
print "Counter is equal to $_[0] !\n";
}
} # end of sub run

The approach given above is generally not recommended because mostPerl programmers will not expect $counter to be changed by thefunction; the example where we used \$counter,i.e. pass-by-reference would be preferred.

Here is a solution that avoids the problem entirely by splitting thecode into two files; the first is really just a wrapper and loader,the second file contains the heart of the code.

multirun6.pl
-----------
#!/usr/bin/perl -w
use strict;
require 'multirun6-lib.pl' ;
for (1..3){
print "run: [time $_]\n";
run();
}

Separate file:

multirun6-lib.pl
----------------
use strict ;
my $counter;
sub run {
$counter = 0;
increment_counter();
increment_counter();
}
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
1 ;

Now you have at least six workarounds to choose from.

For more information please refer to perlref and perlsub manpages.



 
 
>>> More Perl Programming Articles          >>> More By Stas Bekman
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: