Returns and Perl Subroutines

In this final part of a three part series covering subroutines in Perl, we will discuss returns and return values, as well as prototypes. This article is excerpted from chapter nine of the book Perl Best Practices, written by Damian Conway (O’Reilly; ISBN: 0596001738). Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Multi-Contextual Return Values


When there is no “obvious” scalar context return value, consider Contextual::Return instead.


Sometimes no single scalar return value is appropriate for a list-returning subroutine. Your play-testers simply can’t agree: different developers consistently expect different behaviours in different scalar contexts.

For example, suppose you’re implementing a get_server_status() subroutine that normally returns its information as a heterogeneous list:

  # In list context, return all the available information…
  my ($name, $uptime, $load, $users) = get_server_status($server_ID);

You may find that, in scalar contexts, some programmers expected it to return its numeric load value:

  # Total load is sum of individual server loads…
 
$total_load += get_server_status($server_ID);

Others assumed it would return a boolean value indicating whether the server is up:

  # Skip inactive servers…
 
next SERVER if ! get_server_status($server_ID);

Still others anticipated a string summarizing the current status:

  # Compile report on all servers…
  $servers_summary .= get_server_status($server_ID) . "n";

While a fourth group hoped for a hash-reference, to give them convenient named access to the particular server information they wanted:

  # Total users is sum of users on each server…
 
$total_users += get_server_status($server_ID)->{users};

In such cases, implementing any one of these four expectations is going to leave three-quarters of your developers unhappy.

At some point, every subroutine will be called in scalar context, and will have to return something. If that something isn’t obvious to the majority of people, then inexperienced developers—who might not even realize their call is in scalar context—will suffer. And experienced developers will suffer too: ham-strung by the limitations of scalar context return and forced to work with your arbitrary choice of return value.

Perl’s subroutines are context-sensitive for a reason: so that they can Do The Right Thing when used in different ways. But often in scalar contexts there is no one Right Thing. So developers give up and just pick the One Thing That Seems Rightest… to them. All too often, a decision like that leads to confusion, frustration, and buggy code.

Surprisingly, the underlying problem here isn’t that Perl is context-sensitive. The problem is that Perl isn’t context-sensitive enough.

Perl has one kind of list context and one kind of void context, so simple list-context and void-context returns are the perfect tools for those. On the other hand, Perl has at least a dozen distinct scalar subcontexts: boolean, integer, floating-point, string, and the numerous reference types. So, unless one of those return types is the clear and obvious candidate, simple scalar context return is totally inadequate: a sledgehammer when you really need tweezers.

Fortunately, there’s a simple way to allow subroutines like get_server_status() to cater for two or more different scalar-context expectations simultaneously. The Contextual::Return CPAN module provides a mechanism by which you can specify that a subroutine returns different scalar values in boolean, numeric, string, hash-ref, array-ref, and code-ref contexts. For example, to allow get_server_status() to simultaneously support all five return behaviours shown at the start of this guideline, you could simply write:

  use Contextual::Return;

  sub get_server_status {
      my ($server_ID) = @_;

      # Acquire server data somehow…
      my %server_data
          = _ascertain_server_status($server_ID);

      # Return different components of that data, depending on call context…
      return (
          LIST    { @server_data{ qw( name uptime load users ) };         }
          BOOL    { $server_data{uptime} > 0;                         }
          NUM     {
$server_data{load};                   }
          STR     { "$server_data{name}: $server_data{uptime}, $server_data{load}";                  }
          HASHREF { %server_data;    }
       );
   }

Now, in a list context, get_server_status() uses a hash slice to extract the information in the expected order. In a boolean context, it returns true if the uptime is nonzero. In a numeric context, it returns the server load. In a string context, a string summarizing the server’s status is returned. And when the return value is expected to be a hash reference, get_server_status() simply returns a reference to the entire %server_data hash.

Note that each of those alternative return values is lazily evaluated. That means, on any given call to get_server_status() , only one of the five contextual return blocks is actually executed.

Even in cases where you don’t need to distinguish between so many alternatives, the Contextual::Return module can still improve the maintainability of your code, compared to using the built-in wantarray . The module allows you to say explicitly what you want to happen in different return context, and to label each of those outcomes with an obvious keyword. For example, suppose you had a subroutine such as:

  sub defined_samples_in {
      if (wantarray) {
          return grep {defined $_} @_;
      }

      return first {defined $_} @_;
  }

Without changing its behaviour at all, you could make the code considerably more self-documenting, and emphasize the inherent symmetry of the list and scalar cases, by rewriting it with a single contextual return:

  use Contextual::Return;

  sub defined_samples_in {
     
return (
         
LIST   { grep {defined $_} @_ }
          SCALAR { first {defined $_} @_ }
     
);
  }

Besides producing more explicit and less cluttered code, this approach is more maintainable, too. When you need to extend the return behaviour of the subroutine, to more precisely match the expectations of those who use it, you can just add extra labeled return contexts, anywhere in the return list:

use Contextual::Return;
sub defined_samples_in {
return (
LIST { grep {defined $_} @_ } # All defined vals
SCALAR { first {defined $_} @_ } # One defined val
NUM { scalar grep {defined $_} @_ } # How many vals defined?
ARRAYREF { [ grep {defined $_} @_ ] } # Return vals in an array
);  
}    

Regardless of the order in which the alternatives appear, Contextual::Return will automatically select the most appropriate behaviour in each call context.

{mospagebreak title=Prototypes} 


Don’t use subroutine prototypes.


Subroutine prototypes allow you to make use of more sophisticated argument-passing mechanisms than Perl’s “usual list-of-aliases” behaviour. For example:

  sub swap_arrays (@@) {
      my ($array1_ref, $array2_ref) = @_;

      my @temp_array = @{$array1_ref};
     
@{$array1_ref} = @{$array2_ref};
     
@{$array2_ref} = @temp_array;

      return;
  }

  # and later…

  swap_arrays(@sheep, @goats);      # Implicitly pass references

The problem is that anyone who uses swap_arrays() , and anyone who subsequently has to maintain that code, has to know about that subroutine’s special magic. Other wise, they will quite naturally assume that the two arrays will be flattened into a single list and slurped up by the subroutine’s @_ , because that’s what happens in just about every other subroutine they ever use.

Using prototypes makes it impossible to deduce the argument-passing behaviour of a subroutine call simply by looking at the call. They also make it impossible to deduce the context in which particular arguments are evaluated. A subtle but common mistake is to “improve” the robustness of an existing library by putting prototype specifiers on all the subroutines. So a subroutine that used to be defined:

  use List::Util qw( min max );

  sub clip_to_range {
      my ($min, $max, @data) = @_;

      return map { max( $min, min($max, $_) ) } @data;
  }

is updated to:

  sub clip_to_range($$@) { # takes two scalars and an array
     
my ($min, $max, @data) = @_;

      return map { max($min, min($max, $_)) } @data;
  }

The problem is that clip_to_range() was being used with an elegant table-lookup scheme:

  my %range = (
      normalized => [-0.5,0.5],
     
greyscale  => [0,255],
     
percentage => [0,100],
     
weighted   => [0,1],
 
);

  # and later…

  my $range_ref = $range{$curr_range};
  @samples = clip_to_range( @{$range_ref}, @samples);

The $range{$curr_range} hash look-up returns a reference to a two-element array corresponding to the range that’s currently selected. That array reference is then dereferenced by putting a @{ … } around it. Previously, when clip_to_range() was an ordinary subroutine, that dereferenced array found itself in the list context, so it flattened into a list, producing the required minimum and maximum values for the
subroutine’s first two arguments.

But now that clip_to_range() has a prototype, things go very wrong. The prototype starts with a $ , which looks like it’s telling Perl that the first argument must be a scalar. But that’s not what prototypes do at all.

What that $ prototype does is tell Perl that the first argument must be evaluated in a scalar context. And what is the first argument? It’s the array produced by @{$range{$curr_range}} . And what do you get when an array is evaluated in a scalar context? The size of the array, which is 2, no matter which entry in %range was actually selected.

The second argument specification in the prototype is also a $. So the second argument to clip_to_range() must also be evaluated in a scalar context. And that second argument? It’s @samples . Evaluating that array in scalar context once again produces its size. The second argument becomes the number of samples.

The final specification in the prototype is a @ , which specifies that any remaining arguments are evaluated in list context. Of course, there aren’t any more arguments now, but the @ specifier doesn’t complain about that. An empty list is still a list, as far as it’s concerned.

Adding a prototype didn’t really improve the robustness of the code very much. Before it was imposed, clip_to_range() would have been passed the selected minimum, followed by the selected maximum, followed by all the data samples. Now, thanks to the wonders of prototyping, clip_to_range() always gets a minimum of 2, followed by a maximum equal to the number of samples, followed by no data. And Perl doesn’t complain at all, since the prototype was successfully matched by the given arguments, even though it hosed them in the process.

Prototypes cause far more trouble than they avert. Even when they are properly understood and used correctly, they create code that doesn’t behave the way it looks like it ought to, which makes it harder to maintain code that uses them. Furthermore, in OO implementations they engender a completely false sense of security, because they’re utterly ignored in any method call.

Don’t use prototypes. The only real advantage they can confer is allowing array and hash arguments to effectively be passed by reference:

  swap_arrays(@sheep, @goats);

But even then, if you need pass-by-reference semantics, it’s far better to make that explicit:

  sub swap_arrays {
      my ($array1_ref, $array2_ref) = @_;

      my @temp_array = @{$array1_ref};
     
@{$array1_ref} = @{$array2_ref};
     
@{$array2_ref} = @temp_array;

      return;
  }

  # and later…

  swap_arrays(@sheep, @goats);     # Explicitly pass references

Note that the body of swap_arrays() shown here is exactly the same as in the prototyped version at the start of this guideline. Only the call syntax varies. With prototypes it’s magical, and therefore misleading; without prototypes it’s a little uglier, but shows at a glance exactly what the code is doing.

{mospagebreak title=Implicit Returns}


Always return via an explicit return.


If a subroutine “falls off the end” without ever encountering an explicit return, the value of the last expression evaluated in a subroutine is returned. That can lead to completely unexpected return values.

For example, consider this subroutine, which is supposed to return the second odd number in its argument list, or undef if there isn’t a second odd number in the list:

  sub find_second_odd {
      my $prev_odd_found = 0;

      # Check through args…
     
for my $num (@_) {
         # Find an odd number…
        
if (odd($num)) {
            
# Return it if it’s not the first (must be the second)…
            
return $num if $prev_odd_found;

             # Otherwise, remember it’s been seen…
            
$prev_odd_found = 1 ;
         }
      }
      # Otherwise, fail
 
}

When that subroutine is used, strange things happen:

  if (defined find_second_odd(2..6)) {
     
# find_second_odd() returns 5
     
# so the if block does execute as expected
 
}
  if (defined find_second_odd(2..1)) {
     
# find_second_odd() returns undef
     
# so the if block is skipped as expected
 
}

  if (defined find_second_odd(2..4)) {
     
# find_second_odd() returns an empty string (!
      #
so the if block is unexpectedly executed
 
}

  if (defined find_second_odd(2..3)) {
      # find_second_odd() returns an empty string again (!)
     
# so the if block is unexpectedly executed again
 
}

The subroutine works correctly when there is a second odd number to be found, and when there are no numbers at all to be considered, but it behaves—there’s no other word for it—oddly for the in-between cases*. That anomalous empty string is returned because that’s what a failed boolean test evaluates to in Perl. And a failed boolean test is the last expression evaluated in the loop. No, not the conditional in:

    if (odd($num)) {

or in:

        return $num if $prev_found;

The last expression is the (failed) conditional test of the while loop. What while loop? The implicit while loop that the Perl compiler secretly translates every for loop into.

That’s the problem. In order to predict the implicit return value of anything but the simplest subroutine, you not only have to understand the control flow within the sub routine and how that flow may change under different argument lists, but also what sneaky manipulations the compiler is performing on your code before it’s executed.

But none of those complications will ever trouble you if you simply ensure that your subroutines can never “fall off the end”. And all that requires is that every subroutine finishes with an explicit return statement—even if you have to add one “gratuitously”:

  sub find_second_odd {
      my $prev_odd_found = 0;

      # Check through args…
     
for my $num (@_) {
         
# Find an odd number…
          if (odd($num)) {
             
# Return it if it’s not the first (must be the second)…
             
return $num if $prev_odd_found;

              # Otherwise, remember it’s been seen…
             
$prev_odd_found = 1;
          }
      } 
      #Otherwise, fail explicitly
     
return;
  }

Now the subroutine always behaves as expected:

  if (defined find_second_odd(2..6)) {
      # find_second_odd() returns 5
     
# so if the block is executed, as expected
 
}
  if (defined find_second_odd(2..1)) {
     
# find_second_odd() returns undef
      # so if the block is skipped, as expected
 
}
 
if (defined find_second_odd(2..4)) {
      # find_second_odd() returns undef
      # so if the block is skipped, as expecte
d
 
}

  if (defined find_second_odd(2..3)) {
      # find_second_odd() returns undef
     
# so if the block is skipped, as expected
 
}

That extra return is a very small price to pay for perfect predictability.

Note that this rule applies even if your subroutine “doesn’t return anything”. For example, if you’re writing a subroutine to set a global flag, don’t write:

  sub set_terseness {
      my ($terseness) = @_;

      $default_terseness = $terseness;
  }

If the subroutine isn’t supposed to return a meaningful value, make it do so explicitly:

  sub set_terseness {
      my ($terseness) = @_;

      $default_terseness = $terseness;

      return; # Explicitly return nothing meaningful
 
}
 

Otherwise, developers who use the code could misinterpret the lack of an explicit return as indicating a deliberate implicit return instead. So they may come to rely on set_terseness() returning the new terseness value. That misinterpretation will become a problem if you later realize that the subroutine actually ought to return the previous terseness value, because that change in behaviour will now break any client code that was previously relying on the “undocumented feature” provided by the implicit return.

{mospagebreak title=Returning Failure}


Use a bare return to return failure.


Notice that each final return statement in the examples of the previous guideline used a return keyword with no argument, rather than a more-explicit return undef.

Normally, relying on default behaviour is not best practice. But in the case of a return statement, relying on the default return value actually prevents a particularly nasty bug.

The problem with returning an explicit return undef is that—contrary to most people’s expectations—a returned undef isn’t always false.

Consider a simple subroutine like this:

  use Contextual::Return;

  sub guesstimate {
      my ($criterion) = @_;

      my @estimates;
      my $failed = 0;

      # [Acquire data for specified criterion]

      return undef if $failed;

      # [Do guesswork based on the acquired data]

      # Return all guesses in list context or average guess in scalar context…
     
return (
          LIST { @estimates          }
          SCALAR { sum(@estimates)/@estimates;             }
     
);
  }

The successful return values are both fine, and completely appropriate for the two contexts in which the subroutine might be called. But the failure value is a serious problem. Since guesstimate() specifically tests for calls in list context, it’s obvious that the subroutine is expected to be called in list contexts:

  if (my @melt_rates = guesstimate(‘polar melting’)) {
      my $model = Std::Climate::Model->new({ polar_melting => @melt_rates });

      for my $interval (1,2,5,10,50,100,500) {
      print $model->predict({ year => $interval })
      }
  }

But if the guesstimate() subroutine fails, it returns a single scalar value: undef. And in a list context (such as the assignment to @melt_rates), that single scalar undef value becomes a one-element list: (undef). So @melt_rates is assigned that one-element list and then evaluated in the overall scalar context of the if condition. And in scalar context an array always evaluates to the number of elements in it, in this case 1 . Which is true.

Oops!*

What should have happened, of course, is that guesstimate() should have returned a failure value that was false in whatever context it was called, i.e., undef in scalar con text and an empty list in list context:

  if ($failed) {
     
return (
         
LIST   { ()    }
         
SCALAR { undef }
     
);
  }

But that’s precisely what a return itself does when it’s not given an argument: it returns whatever the appropriate false value is for the current call context. So, by always using a bare return to return a “failure value”, you can ensure that you will never bring about the destruction of the entire planetary ecosystem because of an expectedly true undef .

Meanwhile, Chapter 13 presents a deeper discussion on the most appropriate ways to propagate failure from a subroutine.


* Yep, that’s the sound of alarm bells you’re hearing.

† And if that sample happens to be an integer, then $found will be assigned a numeric value, exactly as expected. It will be the wrong numeric value, but hey, at least that will make the bug much more interesting to track down.

* They’d be the “edge-cases”, except that, in this instance, they’re conceptually in the middle of possibilities.

* And here “Oops!” means: the if block executes despite the failure of guesstimate() to acquire any meaningful data. So, when the climate model requests a numerical polar melting rate, that undef is silently converted to zero. This dwimmery causes the model to show that polar melting rates have absolutely no connection to world climate in general, and to rising ocean levels in particular. So mankind can happily keep burning fossil fuels at an ever-greater rate, secure in the knowledge that it has no effect. Until one day, the only person left is Kevin Costner. On a raft.

Google+ Comments

Google+ Comments