Templating Tools

You may have created your own templating system in Perl to meet certain project requirements, but did you know there is a better way? This article, the first in a five-part series, explores your options. It is excerpted from chapter three of Advanced Perl Programming, Second Edition, written by Simon Cozens (O’Reilly; ISBN: 0596004567). Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

A recent thread on comp.lang.perl.moderated enumerated the Perl rites of passage– the perfectly good wheels that every journeyman Perl programmer reinvents. These were found to be a templating system, a database abstraction layer, an HTML parser, a processor for command-line arguments, and a time/date handling module.

See if you recognize yourself in the following story: you need to produce a form letter of some description. You’ve got a certain amount of fixed content, and a certain amount that changes. So you set up a template a little like this:

  my $template = q{
      Dear $name,

      We have received your request for a quote for $product, and have
      calculated that it can be delivered to you by $date at a cost of
      approximately $cost.

      Thank you for your interest,

      Acme Integrated Foocorp.
  };

Then you struggle with some disgusting regular expression along the lines of s/($w+)/$1/eeg, and eventually you get something that more or less does the job.

As with all projects, the specifications change two days after it goes live, so you suddenly need to extend your simple template to handle looping over arrays, conditionals, and eventually executing Perl code in the middle of the template itself. Before you realize what’s happened, you’ve created your own templating language.

Don’t worry if that’s you. Nearly everyone’s done it at least once. That’s why there’s a wide selection of modules on CPAN for templating text and HTML output, ranging from being only slightly more complex than
s/($w+)/$1/eeg to complete independent templating languages.

Before we start looking at these modules, though, let’s consider the built-in solution–the humble Perl format.

{mospagebreak title=Formats and Text::Autoformat}

Formats have been in Perl since version 1.0. They’re not used very much these days, but for a lot of what people want from text formatting, they’re precisely the right thing.

Perl formats allow you to draw up a picture of the data you want to output, and then paint the data into the format. For instance, in a recent application, I needed to display a set of IDs, dates, email addresses, and email subjects with one line per mail. If we assume that the line is fixed at 80 columns, we may need to truncate some of those fields and pad others to wider than their natural width. In pure Perl, there are basically three ways to get this sort of formatted output. There’s sprintf (or printf) and substr:

  for (@mails) {
     
printf "%5i %10s %40s %21sn",
          $_->id,
          substr($_->received,0,10),
          substr($_->from_address,-40,40),
          substr($_->subject,0,21);
  }

Then there’s pack, which everyone forgets about (and which doesn’t give as much control over truncation):

  for (@mails) {
      print pack("A5 A10 A40 A21n",
        $_->id, $_->received, $_->from_address, $_->subject);
  }

And then there’s the format:

  format STDOUT =
  @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<
  $_->id $_->received $_->from_address
                          $_->subject
  .

  for (@mails) {
       write; }

Personally, I think this is much neater and more intuitive than the other two solutions–and has the bonus that it takes the formatting away from the main loop, making the code less cluttered.*

Formats are associated with a particular filehandle; as you can see from the example, we’ve determined that this format should apply to anything we write on standard output. The picture language of formats is pretty simple: fields begin with @ or ^ and are followed by <, |, or > characters specifying left, center, and right justified respectively. After each line of fields comes a line of expressions that fill those fields, one expression for each field. If we like, we could change the format to multiple lines of fields and expressions:

  format STDOUT =
  Id      : @<<<<
  $_->id
  Date    : @<<<<<<<
  $_->received
  From    : @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  $_->from_address
  Subject : @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<
  $_->subject 
  .

We’ve seen examples of the @-type field. If you’re dealing with multi-line formats, you might find that you want to break up a value and show it across several lines of the format. For instance, to display the start of an email alongside metadata about it:

Id

: 1

Hi Simon, Thank you for the

Date

: 10/12/02

supply of widgets that you sent

From

: fred@funglyfoobar.com

me last week. I can assure you

Subject : Widgets

that they have all been put …

This is where the other type of field, the ^ field, comes in: you can achieve the preceding output by using a format like this:  

format STDOUT =

 

Id : @<<<<

^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

$_->id

$message

Date : @<<<<<<<

^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

$_->received

$message

From : @<<<<<<<<<<<<<<<<<<<<

^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

$_->from_address

$message

Subject : @<<<<<<<<<<<<<<<<<<<<…

^<<<<<<<<<<<<<<<<<<<<<<<<<<<…

$_->subject

$message

Unlike the values supplied to an @ field, which can be any Perl expression, these ^ values must take an ordinary scalar. What happens is that each time the format processor sees a ^ field, it outputs as much as it can from the supplied value and then chops that much off the beginning of the value for the next iteration. The … sign at the end of the field indicates that if the supplied value is too long, the format should truncate the value and show three dots instead. If you use ^ fields with values found in lexical variables, such as $message in the previous example, you need to declare the lexical variable before the format, or else it won’t be able to see the variable.

Another boon of using formats is that you can set a header to be sent out at the top of each page–Perl keeps track of how many lines have been printed by a format so it knows when to send out the next page. The header for a particular filehandle is a format named with _TOP appended to the filehandle’s name. The simple use of this is to give column headers to your one-line records:

  format STDOUT_TOP =
  ID   Received From                 Subject
  ===== =========== ======================= 
=======================
  =============
  .

  format STDOUT =
  @<<<< @<<<<<<<<<@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<< @<<<<<<<<<<<<<<<<<<<
  $_->id $_->received $_->from_address  $_->subject
  .

Formats are quite handy, especially as you can associate different formats with different filehandles and send data out to multiple locations in different ways. On the other hand, they have some serious shortcomings that you should bear in mind if you’re thinking of using them in a bigger application.

First, they’re a camping ground for obscure special variables: $% is the current format page number, $= is the number of printable lines per page, $- is the number of lines currently left on the page, $~ is the name of the current output format, $^ is the name of the current header format, and so on. I could not remember a single one of these variables and had to look them up in perlvar.

Formats also deal pretty badly with lexical variables, changing filehandles, variable-length lines, changing formats on the fly, and so on. But they’re handy for neat little hacks.

For complete details on Perl’s built-in formats, read perlform.

{mospagebreak title=Text::Autoformat}

There’s a more 21st century way to deal with formatting, however, and that’s the Text::Autoformat module. This has two main purposes–it wraps text more sensitively than the usual Text::Wrap module or the Unix fmt command, and it provides a syntactically simpler but more featureful replacement for the built-in format language.

Text::Autoformat’s text wrapping capabilities are only tangentially related to templating, but they’re still worth mentioning here.

The idea behind autoformat is to solve the problem of wrapping structured text; it was created specifically for email messages (with special consideration for quoted text, signatures, etc.), but it’s applicable to any structured textual data. For instance, given the text:

  You have:
     * a splitting headache 
     *no tea 
     * your gown (being worn)
              It looks like your gown contains:
         . a thing your aunt gave you which you don’t know what it is
         . a buffered analgesic
         . pocket fluff

fmt fails rather spectacularly:

  You have:
      
* a splitting headache * no tea * your gown
      (being worn)
       
It looks like your gown contains:
          . a thing your aunt gave you which
          you don’t know what it is . a buffered
          analgesic . pocket fluff

In this case, the autoformat subroutine does things a lot better, as it looks ahead at the structure of the text it’s formatting:

  You have:
      
* a splitting headache
      * no tea
      
* your gown (being worn) It looks like your
        gown contains:
          . a thing your aunt gave you which you
            
don’t know what it is
          . a buffered analgesic
          . pocket fluff

Text::Autoformat‘s format language is quite similar to Perl’s native one, but with some simplifications. First, the distinction between filling @ fields and continuing ^ fields is made by the choice of picture character, not the prefix to the field. Hence, what was:

  @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<

now simply becomes:

  <<<<< <<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<

For continuation formats, you now use [ and ] , which repeat as necessary on subsequent lines:

Id      : <<<<<
Message : 
        [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[

This will produce output like the following:

  Id      :    1
  Message :
          Hi Simon, Thank you for the supply of widgets that you sent me
          last week. I can assure you that they have all been put to good…

Unlike Perl’s built-in continuation formats, however, be aware that the [and] lines repeat the entire format time and time again until the variable is completely printed out. So this, for instance, won’t do what you expect:

  Id    : <<<<< [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[[[[[[[[[[[[[[[[[[[[

Instead, it’ll produce output something like this:

Id

:

1

Hi Simon, Thank you for the supply of widgets that you sent

Id

:

 

me last week. I can assure you that they have all been put

Id

:

 

to good use, and have been found, as usual to be the very…

with even more spectacularly bad results for formats longer than one line.

One big advantage, though, is that with Text::Autoformat, formats are just plain strings instead of cleverly compiled patterns interleaved with code. These strings are processed with the form function, which needs to be exported specifically:

  use Text::Autoformat qw(form);

  my $format = <<EOF;
 
Id      : <<<<<
  Date    : <<<<<<<<
  From    : <<<<<<<<<<<<<<<<<<<<<
  Subject : <<<<<<<<<<<<<<<<<<<<<…
  EOF
  my $id = 10;
  my $date = "20/12/02";
  my $from = "Fred Foonly";
  my $subject = "Autoformatted message";
  print form($format, $id, $date, $from, $subject);

Text::Autoformat also provides extremely flexible control over the hyphenation of form fields in a multi-line block, including the ability to plug in other hyphenation routines such as Jan Pazdziora’s TeX::Hyphen, the hyphenation algorithm used in Donald Knuth’s TeX package. The main disadvantage, however, is that you don’t get the same control over headers and footers as you would with write.

Both Perl formats and Text::Autoformat are great for producing formatted output in the style of 1980s form-based programs, but when people think of forms these days, they’re more likely to think of things like form letters. Let’s move on to look at modules that are more suited to this style of templating.

{mospagebreak title=Text::Template}

Mark-Jason Dominus’ Text::Template has established itself as the de facto standard templating system for plain text. Its templating language is very simple indeed–anything between {and} is evaluated by Perl; everything else is left alone.

It is an object-oriented module–you create a template object from a file, filehandle, or string, and then you fill it in:

  use Text::Template;
  my $template = Text::Template->new(TYPE => "FILE",
                                     SOURCE => "email.tmpl");

  my $output = $template->fill_in();

So, let’s say we’ve got the following template:

  Dear {$who},
     
Thank you for the {$modulename} Perl module, which has saved me
  {$hours} hours of work this year. This would have left me free to play
  { int($hours*2.4) } games of go, which I would have greatly appreciated
  had I not spent the time goofing off on IRC instead.

  Love,
  Simon

We set up our template object and our variables, and then we process the template:

  use Text::Template;
  my $template = Text::Template->new(TYPE => "FILE",
                                     SOURCE => "email.tmpl");

  $who = "Mark";
  $modulename = "Text::Template";
  $hours = 15;
  print $template->fill_in();

And the output would look like:

  Dear Mark,
     
Thank you for the Text::Template Perl module, which has saved me
  15 hours of work this year. This would have left me free to play
  36 games of go, which I would have greatly appreciated
  had I not spent the time goofing off on IRC instead.

  Love,
  Simon

Notice that the fill-in variables–$who, $modulename, and so on–are not my variables. When you think about it, this ought to be obvious–the my variables are not in Text::Template’s scope, and therefore it wouldn’t be able to see them. This is a bit unpleasant: Text::Template has access to your package variables, and you have to do a bit more work if you want to avoid giving use strict a fit.

Text::Template has two solutions to this. The first is pretty simple–just move the fill-in variables into a completely different package:

  use Text::Template;
  my $template = Text::Template->new(TYPE => "FILE",
                                     SOURCE => "email.tmpl");

  $Temp::who = "Mark";
  $Temp::modulename = "Text::Template"; 
  $Temp::hours = 15;
  print $template->fill_in(PACKAGE => "Temp");

That’s slightly better, but it still doesn’t please people for whom global variables are pure evil. If that’s you, you can get around the problem by passing in a portable symbol table–that is, a hash:

  use Text::Template;
  my $template = Text::Template->new(TYPE => "FILE",
                                    
SOURCE => "email.tmpl");

  print $template->fill_in(HASH => {
     
who => "Mark",
     
modulename => "Text::Template",
     
hours => 15
 
});

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]

chat