Perl 101: The email form - Refining Your Script (
Page 3 of 5 )
So far we
have a working script, but it lacks many of the refinements which distinguish
amateur scripting from professional scripting; and more importantly, it is
insecure - if your web server is cracked because of one of your CGI scripts then
your provider will not be very happy.
The golden rule in CGI programming
is to never trust user input. Some users are stupid, others are malicious - just
because you ask them for their name, it doesn’t guarantee that they won't enter
a series of commands which *could* cause your web server to display it's
password files.
Characters such as '|`>< have special meanings to
perl and can be used by the ingenious cracker to make your script do things it
shouldn't. In fact, it's much safer to list what we *should* allow rather than
what we should *disallow*. In this particular case, limiting user input to
letters, numbers, underscores, spaces, periods, question marks, exclamation
marks, hyphens, and at signs (@) should be sufficient.
Here's where the
fun starts. We're going to be using regular expressions, which are a very
important (and often complex) part of Perl. If you are coming from another
programming language you may already be familiar with regular expressions (or
regex's, as the are called for short). If this is your first time then they can
look very daunting, however fear not as all will be explained.
First
off, a simple one:
unless ($name =~ /^[\w ]/)
{
print "Oops you entered your name incorrectly - please go back and check
it<br>";
die;
} The important part of this is $name
=~ /^[\w .]/ .Here we are testing to see if the name supplied by the user
contains any characters which are not letters, numbers, or spaces. The two
backslashes are boundaries – everything inside of them is treated by Perl as a
regex, so our regex is:
^[\w ] \w is
a shorthand meaning 'any word character', however Perl's definition of a word
character is not what might you expect: to Perl it means 'any letter, number or
an underscore'. The blank space following the \w is literal – i.e. Perl will
look for a blank space.
The square brackets ([ and ]) indicate that
everything inside them is an alternative: Perl will be happy if it finds either
a word character *or* a space. Finally, we invert the meaning of [\w ] by
putting a carot (^) at the beginning:
unless ($name
=~ /^[\w ]/)
{
print "Oops you entered your name incorrectly - please go
back and check it<br>";
die;
} Now that you have a
basic understanding of regular expressions, this line could be rewritten in
English as:
If $name contains any character which is NOT a letter,
number, underscore or space, THEN print an error message and stop.
So
that was your first lesson in pattern matching and the bizarre world of regular
expressions. If you understand everything we've covered then give yourself a pat
on the back - it can be very hard at first, but soon you'll be able to write
your own regex's with your eyes shut. If you are still unsure of regular
expressions, then please take the time to re-read this page and suss them out -
the effort will pay off once you start writing your own Perl
scripts.