Perl Subroutines: Arguments and Values

In this second part of a three-part series covering subroutines in Perl, you will learn about missing arguments, default argument values, and more. It is excerpted from chapter nine of the book Perl Best Practices, written by Damian Conway (O’Reilly; ISBN: 0596001738). Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Missing Arguments


Use definedness or existence to test for missing arguments.


It’s a common mistake to use a boolean test to probe for missing arguments:

  Readonly my $FILLED_USAGE => ‘Usage: filled($text, $cols, $filler)';

  sub filled {
      my ($text, $cols, $filler) = @_;

      croak $FILLED_USAGE
          if !$text || !$cols || !$filler;

      # [etc.]
 
}

The problem is that this approach can fail in subtle ways. If, for example, the filler character is ‘0’ or the text to be padded is an empty string, then an exception will incorrectly be thrown.

A much more robust approach is to test for definedness:

  use List::MoreUtils qw( any );

  sub filled {
      my ($text, $cols, $filler) = @_;

      croak $FILLED_USAGE
          if any {!defined $_} $text, $cols, $filler;

      # [etc.]
 
}

Or, if a particular number of arguments is required, and undef is an acceptable value for one of them, test for mere existence:

  sub filled {
      croak $FILLED_USAGE if @_ != 3; # All three args must be supplied

      my ($text, $cols, $filler) = @_;
     
# etc.
 
}

Existence tests are particularly efficient because they can be applied before the argument list is even unpacked. Testing for the existence of arguments also promotes more robust coding, in that it prevents callers from carelessly omitting a required argument, and from accidentally providing any extras.

Note that existence tests can also be used when some arguments are optional, because the recommended practice for this case—passing options in a hash—ensures that the actual number of arguments passed is fixed (or fixed-minus-one, if the options hash happens to be omitted entirely):

  sub filled {
      croak $FILLED_USAGE if @_ < 1 || @_ > 2;

      my ($text, $opt_ref) = @_;  # Cols and fill char now passed as options

      # etc.
 
}

{mospagebreak title=Default Argument Values}


Resolve any default argument values as soon as @_ is unpacked.


The fundamental rule of argument processing is: nothing happens in the subroutine until all the arguments are stable. Don’t, for example, add in defaults on the fly:  

  Readonly my $DEF_PAGE_WIDTH => 78;
  Readonly my $SPACE          => q{ };

  sub padded {
      my ($text, $arg_ref) = @_;

      # Compute left and right spacings…
     
my $gap   = ($arg_ref->{cols}||$DEF_PAGE_WIDTH) – length($text||=$EMPTY_STR);
      my $left  = $arg_ref->{centered} ? int($gap/2) : 0;
      my $right = $gap – $left;

      # Prepend and append space…
     
my $filler = $arg_ref->{filler} || $SPACE;
      return $filler x $left . $text . $filler x $right;
  }

Apart from making the gap computation much harder to read and to verify, using the || and ||= operators to select default values is equivalent to testing for truth, and therefore much more prone to error on the edge cases (such as a ‘0’ fill character).

If default values are needed, set them up first. Separating out any initialization will make your code more readable, and simplifying the computational statements is likely to make them less buggy too:

  sub padded {
      my ($text, $arg_ref) = @_;

      # Set defaults...

#

If option given…

Use option

Else default

my $cols

= exists $arg_ref->{cols}

? $arg_ref->{cols}

: $DEF_PAGE_WIDTH;

my $filler = exists $arg_ref->{filler} ? $arg_ref->{filler} : $SPACE;

      # Compute left and right spacings…
     
my $gap   = $cols – length $text;
      my $left  = $arg_ref->{centered} ? int($gap/2) : 0;
      my $right = $gap – $left;

      # Prepend and append space…
     
return $filler x $left . $text . $filler x $right;
  }

If there are many defaults to set up, the cleanest way to do that is by factoring the defaults out into a table (i.e., a hash) and then pre-initializing the argument hash with that table, like so:

  Readonly my %PAD_DEFAULTS => (
      cols     => 78,
      centered => 0,
      filler   => $SPACE,
     
# etc.
 
);

  sub padded {
      my ($text, $arg_ref) = @_;

      # Unpack optional arguments and set defaults…
     
my %arg = ref $arg_ref eq ‘HASH’ ? (%PAD_DEFAULTS, %{$arg_ref})
              :       %PAD_DEFAULTS;

      # Compute left and right spacings…
     
my $gap   = $arg{cols} – length $text;
      my $left  = $arg{centered} ? int($gap/2) : 0;
      my $right = $gap – $left;

      # Prepend and append space…
     
return $arg{filler} x $left . $text . $arg{filler} x $right;
  }

When the %arg hash is initialized, the defaults are placed ahead of the arguments supplied by the caller ((%PAD_DEFAULTS, %{$arg_ref})). So the entries in the default table are assigned to %arg first. Those default values are then overwritten by any entries from $arg_ref.

{mospagebreak title=Scalar Return Values}


Always return scalar in scalar returns.


One of the more subtle features of Perl subroutines is the way that their call context propagates to their return statements. In most places in Perl, the context (list, scalar, or void) can be deduced at compile time. One place where it can’t be determined in advance is to the right of a return. The argument of a return is evaluated in what ever context the subroutine itself was called.

That’s a very handy feature, which makes it easy to factor out or rename specific uses of built-in functions. For example, if you found yourself repeatedly filtering undefined and negative values out of lists:

  @valid_samples = grep {defined($_) && $_ >= 0} @raw_samples;

it would be better to encapsulate that complex filter and rename it more meaningfully:

  sub valid_samples_in {
     
return grep {defined($_) && $_ >= 0} @_;
  }

  # and then…

  @valid_samples = valid_samples_in(@raw_samples);

Because the return expression is always evaluated in the same context as the surrounding call, it’s also still okay to use this subroutine in scalar context:

  if (valid_samples_in(@raw_samples) < $MIN_SAMPLE_COUNT) {
              
report_sensor_malfunction();
  }

When the subroutine is called in scalar context, its return statement imposes scalar context on the grep, which then returns the total number of valid samples—just as a raw grep would do in the same position.

Unfortunately, it’s easy to forget about the contextual lycanthropy of a return , espe cially when you write a subroutine that is “only ever going to be used one way”*. For example:

  sub how_many_defined {
     
return grep {defined $_} @_ ;
  }

  # and "always" thereafter:

  my $found = how_many_defined(@raw_samples);

But eventually someone will write:

  my ($found) = how_many_defined(@raw_samples);

and introduce a very subtle bug. The parentheses around $found put it in a list con text, which puts the call to how_many_defined() in a list context, which puts the grep inside how_many_defined() in a list context, which causes the return to return the list of defined samples, the first of which is then assigned to $found †.

If there were even the slightest chance that this scalar-returning subroutine might ever be called in a list context, it should have been written as follows:

  sub how_many_defined {
     
return scalar grep {defined $_} @_;
  }

There is no shame in using an explicit scalar anywhere you know you want a scalar but you’re not confident of your context. And because you can never be confident of your context in a return statement, an explicit scalar is always acceptable there.

At very least, you should always add one anywhere that a previously mistaken expec tation regarding context has already bitten you. That way, the same misconception won’t bite whoever is eventually responsible for the care and feeding of your code (that is, most likely you again, six months later).

{mospagebreak title=Contextual Return Values}


Make list-returning subroutines return the “obvious” value in scalar context.


There is only one kind of list in Perl, so returning in a list context is easy—you just return all the values you produced:

  sub defined_samples_in {
      return grep {defined $_} @_;
  }

But what should that subroutine return in a scalar context? It might legitimately return an integer count (like grep itself does), in which case the subroutine stays exactly the same:

  sub defined_samples_in {
      return grep {defined $_} @_;
  }

Or it might instead return some serialized string representation of the list (like localtime does in scalar context):

  sub defined_samples_in {
      my @defined_samples = grep {defined $_} @_;

      # Return all defined args in list context…
     
if (wantarray) {
          return @defined_samples;
      }
      # Otherwise a serialized version in scalar context…
     
return join($COMMA, @defined_samples);
  }

Or it might return the “next” value in a series (like readline does):

  use List::Util qw( first );

  sub defined_samples_in {
     
# Return all defined args in list context…
     
if (wantarray) {
          return grep {defined $_} @_;
      }

      # Or, in scalar context, extract the first defined arg…
     
return first {defined $_} @_;
  }

It might try to preserve as much information as possible and return the full list of values using an array reference (which no Perl 5 builtin does):

  sub defined_samples_in {
      my @defined_samples = grep {defined $_} @_;

      # Return all defined args in list context…
     
if (wantarray) {
          return @defined_samples;
      }
     
# Return all defined args (indirectly) in scalar context…
     
return @defined_samples;
  }

It might even give up in disgust (like sort does):

  sub defined_samples_in {
      croak q{Useless use of ‘defined_samples_in’ in a non-list context}
          if !wantarray;

      return grep {defined $_} @_;
  }

Perl’s list-returning builtins don’t have a consistent behaviour in scalar context. They try to “do the right thing” on a case-by-case basis. Mostly they get it right; the scalar context results of grep, and localtime, and readline are what most people expect them to be.

Unfortunately, they don’t always get it right. The scalar return values of select , readpipe , splice , unpack , and the various get… functions can be surprising to infre quent users of these functions. They have to be either memorized or repeatedly looked up in the fine manual. For many people, this makes using those builtins harder than it should be.

Don’t perpetuate those difficulties in your own development. If you’re writing a library of subroutines, make them predictable. Make every list-returning subroutine return the “obvious” value in scalar context.

What’s the “obvious” value? It’s the value that the developers who use the subroutine actually expect it to return. For example, if they all use defined_samples_in() like so:

defined_samples_in( ) like so: What’s the “obvious” value? It’s the value that the developers who use the subroutine actually expect it to return. For example, if they all use defined_samples_in( ) like so:

  if ( defined_samples_in(@samples) > 0 ) { 
     
process(@samples);
  }

then they obviously expect it to return a count of defined samples. So the “obvious” scalar context return value is that count.

On the other hand, if everyone uses it like this:

  my $floor_samples_ref         = defined_samples_in(@floor_samples);
  my $restocked_samples_ref     = defined_samples_in(@restocked_samples);

  # and later…

  swap_arrays($floor_samples_ref, $restocked_samples_ref);

then the expectation is clearly that the subroutine returns a reference to the array of results. So that’s the “obvious” scalar return value.

In other words, the “obvious” return value in a scalar context is whatever the people who use your code think it’s going to be (before they read the fine manual). That definition of obviousness presents a dilemma, though. The way you work out whether your proposed scalar-context behaviour is obvious is by implementing it and seeing how many people it trips up. But once the subroutine is deployed and client code is relying on it, it’s too late to change its return value if that value turns out not to be what most people expect.

The solution (which is discussed in greater detail in Chapter 17) is to “play test” the subroutine before it’s deployed. That is, ask the people who will actually be using your subroutine what they expect it will do in scalar context. Or, better yet, have them write sample code that uses the subroutine, and see how they use it. If you get a consensus (or even just a simple majority opinion), implement that. If you don’t get agreement on a single “obvious” behaviour, see the “Multi-Contextual Return Values” guideline later in this chapter.

Unfortunately, getting this kind of preliminary feedback isn’t always feasible. In such cases, you should simply select a reasonable default, based on the three fundamental categories of list-returning subroutines: homogeneous, heterogeneous, and iterative.

A homogeneous list-returning subroutine is one that returns a list of data values that are all of a single type: a list of samples, a list of names, or a list of images. Perl’s built-in map , grep , and sort are examples of this type of subroutine. Because no one value in a homogeneous list is more significant than any other, the only interesting property of the list in a scalar context is usually the number of values it contains. Hence, in scalar contexts, homogeneous subroutines are usually expected to return a count, as map and grep both do.

A heterogeneous list-returning subroutine is one that returns a list containing distinct pieces of information: name, rank, and serial number; account number, account name, and balance; year, month, day. For example, the stat , caller , and getpwent builtins are all heterogeneous. The lists returned by subroutines of this type often do have a single piece of information that is more significant than any other, and they’re typically expected to return that value in scalar contexts. For example, caller returns the caller’s package name, whilst getpwent returns the relevant username.

Alternatively, all of the information returned by a heterogeneous subroutine might be equally important. So this type of subroutine is sometimes expected to return some kind of serialized representation of that information in scalar context, as localtime and gmtime do.

An iterative list-returning subroutine is one that returns an iterated series of values, typically the result of successive input operations. The builtins readline and readdir work this way. Iterative subroutines are always used for stepping through sequences of data, so in a scalar context, they should always return the result of a single iteration.

Remember, though, that these suggested default behaviours are recommendations, not natural laws. You may find that your “play testing” suggests that some other return value is more appropriate—more expected—in your particular subroutine. In that case, you should implement and deploy that behaviour instead, and then explicitly document the reasons for your choice.

Please check back next week for the conclusion to this article. 

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan