Thursday, September 16, 2010

Perl Taint Mode Regex

I think I finally have a handle on Perl's taint mode as a result of a couple scripts I've been working with. I stumbled upon taint mode after reading an article that said that most web based exploits are the result of programmers (or developers, as the kids like to say) fail to validate input. What taint does is to cause the script to fail, if inputs are not validated. To invoke taint mode modify the shebang to read:
#!/usr/bin/perl -T
Now input's must be laundered:
$validate=$form{"code"};
if ($validate =~ /^(\w*)(3|5|7)(\d{3})$/){
$validate=$1.$2.$3; }else{
die "can not validate"; }
The first line reads the input, but the input is untrusted. In the conditional, we compare the variable against a pre-defined, expected format. If the input matches the format, the variable is set back to an value. If the validation fails, the script befalls a brutal and senseless death... Which is better than being compromised or exploited.

The trick to this process is understanding how to format the Regex and understanding how it is laundered. First, the format pattern is not regex. Sure, all the docs say it is... but its not. So, here's what you need to know:
the format sits between / /
the ^ and $ are anchors, as regex
the ( ) encloses checks
the checks are numbered
the first check is $1, second is $2, etc
if there is an | in a check, its and "or"
the * is a wildcard count
but {3} says exactly 3 characters
Let's look at the example above:
^(\w*)(3|5|7)(\d{3})$
Start at the beginning, get an infinite number of \w characters and assign them to $1. Look for a 3 or a 5 or 7 and assign it to $2 (second set of parens, thus second check.) The last check ensures that the last three characters are digits. Remember that regex is "greedy", so effectively, this expression is evalutated backwards.

Now that you validated the input against the pattern, reassign the checks ($1.$2.$3) back to the variable. This nukes whatever badness the evil doer tried to impose on you. Do this for every variable you read in, and then destroy the input array, to ensure that no lazy developer slides a new form value into the script without validating.

To test match patterns from Bash, try this one-liner:
perl -e 'if("test01" =~ /^(\w{4}\d{2})$/ ){ \
  print "+ $1.$2.$3.$4.$5"}else{ \
  print "- $1.$2.$3.$4.$5"}'; echo
Simple, huh? Let's do an e-mail address:
perl -e \
'if( "xxx\@yyy.us" =~ /^(\w{1}[\w\-\.\_]+\w\@)(\w{1}[\w\-\.\_]+\.)(us|com|net)$/ ){\
  print "+ $1.$2.$3.$4.$5"}else{
  print "- $1.$2.$3.$4.$5"}'; echo
Ouch! (BTW: Bash made me escape the @ symbol.)

For a break down on all the pattern matches check out Steve Litt's Perls of Wisdom.

No comments:

Post a Comment