One area often overlooked in CGI programming is security. In this article Pete looks at common flaws in CGI scripts and how to fix them with Perl's taint mode, by filtering user input and more.
As we've already seen, tainted-ness follows a variable - even if it's value is assigned to another variable. There is, however, one way to untaint data - by matching with a regex.
$something =~ /^([\w.]+)$/; $cleanvariable = $1;
Here we specify that $something must contain only letters, numbers, underscore, whitespace, or period. The ^ and $ which force the regex to start and finish with one of these 5 characters.
We've also included the [\w.] in braces, allowing us to make use of the $1, $2, $3 etc shortcuts. In our example $1 contains the whole value of $something (assuming it *did* only contain letters, numbers, underscores, white space, or periods).
We can shorten this a little further...
($cleanvariable) = $something =~ /^([\w.]+)$/;
... since the regex returns the $1, $2, $3 etc variables as a list.
The beauty of taint is that it considers all user input to be unsafe by default, but easily allows us to untaint a variable using a simple regex. Forcing us to perform a pattern match on the variable stops lazy habits from putting our security at risk, and makes us think a little more carefully about just what we expect to find in that data: even though it is not a security risk, it is still good practice to reject a telephone number if it contains letters.
At first you will find taint mode rather frustrating - your script will die for so many extra reasons, but after a while you will find it second nature to think secure, and you scripts will be a lot better for it.