Writing Secure CGI Scripts - Untainting data (
Page 4 of 4 )
As we've already seen, tainted-ness follows a variable - even if it's value is assigned to another variable. There is, however, one way to untaint data - by matching with a regex.
$something =~ /^([\w.]+)$/;
$cleanvariable = $1; Here we specify that $something must contain only letters, numbers, underscore, whitespace, or period. The ^ and $ which force the regex to start and finish with one of these 5 characters.
We've also included the [\w.] in braces, allowing us to make use of the $1, $2, $3 etc shortcuts. In our example $1 contains the whole value of $something (assuming it *did* only contain letters, numbers, underscores, white space, or periods).
We can shorten this a little further...
($cleanvariable) = $something =~ /^([\w.]+)$/; ... since the regex returns the $1, $2, $3 etc variables as a list.
The beauty of taint is that it considers all user input to be unsafe by default, but easily allows us to untaint a variable using a simple regex. Forcing us to perform a pattern match on the variable stops lazy habits from putting our security at risk, and makes us think a little more carefully about just what we expect to find in that data: even though it is not a security risk, it is still good practice to reject a telephone number if it contains letters.
At first you will find taint mode rather frustrating - your script will die for so many extra reasons, but after a while you will find it second nature to think secure, and you scripts will be a lot better for it.