Perl
  Home arrow Perl arrow Page 3 - Introduction to mod_perl (part 5): More Perl Basics
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PERL

Introduction to mod_perl (part 5): More Perl Basics
By: Stas Bekman
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 5
    2003-03-12


    Table of Contents:
  • Introduction to mod_perl (part 5): More Perl Basics
  • my() Scoped Variable in Nested Subroutines
  • When You Cannot Get Rid of The Inner Subroutine
  • perldoc's Rarely Known But Very Useful Options
  • References

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Introduction to mod_perl (part 5): More Perl Basics - When You Cannot Get Rid of The Inner Subroutine
    ( Page 3 of 5 )

    First you might wonder, why in the world will someone need to define an inner subroutine? Well, for example to reduce some of Perl's script startup overhead you might decide to write a daemon that will compile the scripts and modules only once, and cache the pre-compiled code in memory. When some script is to be executed, you just tell the daemon the name of the script to run and it will do the rest and do it much faster since compilation has already taken place.

    Seems like an easy task, and it is. The only problem is once the script is compiled, how do you execute it? Or let's put it the other way: after it was executed for the first time and it stays compiled in the daemon's memory, how do you call it again? If you could get all developers to code their scripts so each has a subroutine called run() that will actually execute the code in the script then we've solved half the problem.

    But how does the daemon know to refer to some specific script if they all run in the main:: name space? One solution might be to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. However, since there is a chance that there will be more than one script with the same name but residing in different directories, then in order to prevent namespace collisions the directory has to be a part of the package name too. And don't forget that the script may be moved from one directory to another, so you will have to make sure that the package name is corrected every time the script gets moved.

    But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that the daemon is about to execute for the first time, the script should be wrapped inside the package whose name is constructed from the mangled path to the script and a subroutine called run(). For example if the daemon is about to execute the script /tmp/hello.pl:

      hello.pl
      --------
      #!/usr/bin/perl
      print "Hello\n";

    Prior to running it, the daemon will change the code to be:

      wrapped_hello.pl
      ----------------
      package cache::tmp::hello_2epl;
    
    
      sub run{
        #!/usr/bin/perl 
        print "Hello\n";
      }

    The package name is constructed from the prefix cache::, each directory separation slash is replaced with ::, and non alphanumeric characters are encoded so that for example . (a dot) becomes _2e (an underscore followed by the ASCII code for a dot in hex representation).

     % perl -e 'printf "%x",ord(".")'

    prints: 2e. The underscore is the same you see in URL encoding except the % character is used instead (%2E), but since % has a special meaning in Perl (prefix of hash variable) it couldn't be used.

    Now when the daemon is requested to execute the script /tmp/hello.pl, all it has to do is to build the package name as before based on the location of the script and call its run() subroutine:

      use cache::tmp::hello_2epl;
      cache::tmp::hello_2epl::run();

    We have just written a partial prototype of the daemon we wanted. The only outstanding problem is how to pass the path to the script to the daemon. This detail is left as an exercise for the reader.

    If you are familiar with the Apache::Registry module, you know that it works in almost the same way. It uses a different package prefix and the generic function is called handler() and not run(). The scripts to run are passed through the HTTP protocol's headers.

    Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:

      simple.pl
      ---------
      #!/usr/bin/perl 
      sub hello { print "Hello" }
      hello();

    Wrapped into a run() subroutine it becomes:

      simple.pl
      ---------
      package cache::simple_2epl;
    
    
      sub run{
        #!/usr/bin/perl 
        sub hello { print "Hello" }
        hello();
      }

    Therefore, hello() is an inner subroutine and if you have used my() scoped variables defined and altered outside and used inside hello(), it won't work as you expect starting from the second call, as was explained in the previous section.

    Remedies for Inner Subroutines

    First of all there is nothing to worry about, as long as you don't forget to turn the warnings On. If you do happen to have the ``my() Scoped Variable in Nested Subroutines'' problem, Perl will always alert you.

    Given that you have a script that has this problem, what are the ways to solve it? There are many of them and we will discuss some of them here.

    We will use the following code to show the different solutions.

      multirun.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run{
    
        my $counter = 0;
    
        increment_counter();
        increment_counter();
    
        sub increment_counter{
          $counter++;
          print "Counter is equal to $counter !\n";
        }
    
      } # end of sub run

    This code executes the run() subroutine three times, which in turn initializes the $counter variable to 0, every time it is executed and then calls the inner subroutine increment_counter() twice. Sub increment_counter() prints $counter's value after incrementing it. One might expect to see the following output:

      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 3]
      Counter is equal to 1 !
      Counter is equal to 2 !

    But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:

      % ./multirun.pl
      Variable "$counter" will not stay shared at ./nested.pl line 18.
      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 3 !
      Counter is equal to 4 !
      run: [time 3]
      Counter is equal to 5 !
      Counter is equal to 6 !

    Obviously, the $counter variable is not reinitialized on each execution of run(). It retains its value from the previous execution, and sub increment_counter() increments that.

    One of the workarounds is to use globally declared variables, with the vars pragma.

      multirun1.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
      use vars qw($counter);
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run {
    
        $counter = 0;
    
        increment_counter();
        increment_counter();
    
        sub increment_counter{
          $counter++;
          print "Counter is equal to $counter !\n";
        }
    
      } # end of sub run

    If you run this and the other solutions offered below, the expected output will be generated:

      % ./multirun1.pl
    
    
      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 3]
      Counter is equal to 1 !
      Counter is equal to 2 !

    By the way, the warning we saw before has gone, and so has the problem, since there is no my() (lexically defined) variable used in the nested subroutine.

    Another approach is to use fully qualified variables. This is better, since less memory will be used, but it adds a typing overhead:

      multirun2.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run {
    
        $main::counter = 0;
    
        increment_counter();
        increment_counter();
    
        sub increment_counter{
          $main::counter++;
          print "Counter is equal to $main::counter !\n";
        }
    
      } # end of sub run

    You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it may not be good idea if the variable can be very large, or if speed of execution is an issue.

    Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you don't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of your users is bored and wants to test how good your code is. It's not uncommon to see users copy-and-paste 10Mb core dump files into a form's text fields and then submit it for your script to process.

      multirun3.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run {
    
        my $counter = 0;
    
        $counter = increment_counter($counter);
        $counter = increment_counter($counter);
    
        sub increment_counter{
          my $counter = shift;
    
          $counter++;
          print "Counter is equal to $counter !\n";
    
          return $counter;
        }
    
      } # end of sub run

    Finally, you can use references to do the job. The version of increment_counter() below accepts a reference to the $counter variable and increments its value after first dereferencing it. When you use a reference, the variable you use inside the function is physically the same bit of memory as the one outside the function. This technique is often used to enable a called function to modify variables in a calling function.

      multirun4.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run {
    
        my $counter = 0;
    
        increment_counter(\$counter);
        increment_counter(\$counter);
    
        sub increment_counter{
          my $r_counter = shift;
    
          $$r_counter++;
          print "Counter is equal to $$r_counter !\n";
        }
    
      } # end of sub run

    Here is yet another and more obscure reference usage. We modify the value of $counter inside the subroutine by using the fact that variables in @_ are aliases for the actual scalar parameters. Thus if you called a function with two arguments, those would be stored in $_[0] and $_[1]. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable as would be the case of calling the function with a literal, e.g. increment_counter(5)).

      multirun5.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
    
      sub run {
    
        my $counter = 0;
    
        increment_counter($counter);
        increment_counter($counter);
    
        sub increment_counter{
          $_[0]++;
          print "Counter is equal to $_[0] !\n";
        }
    
      } # end of sub run

    The approach given above is generally not recommended because most Perl programmers will not expect $counter to be changed by the function; the example where we used \$counter, i.e. pass-by-reference would be preferred.

    Here is a solution that avoids the problem entirely by splitting the code into two files; the first is really just a wrapper and loader, the second file contains the heart of the code.

      multirun6.pl
      -----------
      #!/usr/bin/perl -w
    
    
      use strict;
      require 'multirun6-lib.pl' ;
    
      for (1..3){
        print "run: [time $_]\n";
        run();
      }

    Separate file:

      multirun6-lib.pl
      ----------------
      use strict ;
    
    
      my $counter;
      sub run {
        $counter = 0;
        increment_counter();
        increment_counter();
      }
    
    
      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\n";
      }
    
      1 ;

    Now you have at least six workarounds to choose from.

    For more information please refer to perlref and perlsub manpages.



     
     
    >>> More Perl Articles          >>> More By Stas Bekman
     

       

    PERL ARTICLES

    - More Perl Bits
    - Perl, Bit by Bit
    - Basic Charting with Perl
    - Using Getopt::Long: More Command Line Option...
    - Command Line Options in Perl: Using Getopt::...
    - Web Access with LWP
    - More Templating Tools for Perl
    - Site Layout with Perl Templating Tools
    - Build a Perl RSS Aggregator with Templating ...
    - Looping, Security, and Templating Tools
    - Perl: Bon Voyage Lists and Hashes
    - Templating Tools
    - Perl: Number Crunching
    - Perl Debuggers in Detail
    - Debugging Perl





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    Stay green...Green IT