Home arrow Perl Programming arrow Page 3 - Parsing a Querystring With Perl

A Simple POST Query&toc - Perl

Having trouble parsing a querystring with Perl? If so, then Jeff's step-by-step guide will save you headaches and have you up and running in no time!

TABLE OF CONTENTS:
  1. Parsing a Querystring With Perl
  2. A Simple ISINDEX Query
  3. A Simple POST Query
By: Jeff Pinyan
Rating: starstarstarstarstar / 7
December 18, 2002

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
Content-type: application/x-www-form-urlencoded
Content-length: 40

name=Jeff+Pinyan&email=japhy%40pobox.com


Decoding an a/xwfu POST query is much like the GET method, except that we must first read CONTENT_LENGTH bytes from standard input. That acts as our pseudo query string.

It is important to know that a POST query can be made at the same time as a GET query, by placing data in the query string. This is valid, and deserves to be decoded.

# %kv_pairs = simple_post_query($squash, $strip);

sub simple_post_query {
my ($squash,$strip) = @_;
read STDIN, my($str), $ENV{CONTENT_LENGTH};
my %kv;

# ; to & translation
$str =~ tr/;/&/;

# & squishing
$str =~ tr/&//s if $squash;

# leading/trailing & removal
$str =~ s/^&+//, $str =~ s/&+$// if $strip;

# for each k=v pair
for (split /&/, $str) {
# third arg of '2' because $_ might be 'a=b=c'
my ($k,$v) = split /=/, $_, 2;

# don't allow for blank key
next if $k eq "";

($k,$v) = map url_decode, ($k,$v);

if (not exists $kv{$k}) { $kv{$k} = $v }
elsif (not ref $kv{$k}) { $kv{$k} = [ $kv{$k}, $v ] }
else { push @{ $kv{$k} }, $v }
}

return %kv;
}


See? Hardly nothing new, except for a call to read() and the use of the $ENV{CONTENT_LENGTH} variable. I prefixed the function with simple_, because the m/fd query parser is going to be much more involving.

A Complex POST Query

CONTENT_LENGTH => 344
CONTENT_TYPE => multipart/form-data; boundary=5154532515249

--5154532515249
Content-Disposition: form-data; name="feature"

123
--5154532515249
Content-Disposition: form-data; name="comment"

what's up?
--5154532515249
Content-Disposition: form-data; name="foobar"

456
--5154532515249--


The real difficulty comes in parsing an m/fd POST query, because different browsers behave in different ways. This is A Bad Thing, since specifications are supposed to be adhered to. But we will have to make do, and get around special cases whenever possible. Some of the discrepancies are:
  • MSIE 3.01 and 3.02 on the Macintosh don't use two leading hyphens in the boundary string
  • Many browsers don't escape " characters in the field names or filenames, which creates quite a problem when trying to get the value between quotes
  • Certain browsers (such as MSIE 5.0 for Win32) automatically remove fields with no name (like <input type="text" name="">)
As for our quoting problem, no work-around seems to exist. Imagine the following HTML tag:

<input type="text" name='something"; filename="foo.bar'>

A m/fd POST query would contain

Content-Disposition: form-data; name="something"; filename="foo.bar"

We just fooled a query parser into thinking that there's a file upload present. That's not very nice of us, but then, the browser isn't too smart. Sadly, this has no workaround. We can only hope that browsers start to escape their quotes -- and we will make a pseudo-browser that does this, later. For now, let's look at the code:

# %kv_pairs = complex_post_query($strip)

sub complex_post_query {
my ($strip) = @_;
my ($CRLF,$boundary,%kv);

# different OSs define \r and \n differently
# so adjust to make sure we get the line-ending right
$CRLF = $^O =~ /VMS/i ? "\n" : # VMS
"\t" ne "\011" ? "\r\n" : # EBCDIC (non-ASCII)
"\015\012"; # others

# for reading from STDIN
local $/ = $CRLF;

# for reading binary data on sensitive OSs
binmode STDIN if $^O =~ /^(?:WIN|VMS|OS2)/i;

# Mac MSIE 3.01/3.02 doesn't put '--' at
# the beginning of the boundary string
# (so says Lincoln Stein in CGI.pm)
chomp($boundary = );
$boundary =~ s/^--// if
$ENV{HTTP_USER_AGENT} =~ /MSIE\s+3\.0[12];\s*Mac/;

FORM_DATA:
while (1) {
my (%hd,$header,$value,$param,$filename,$skip);

# parse headers
while () {
chomp;
last if /^$/;

# header continutation (see RFC 822 3.4.8
# on wrapping long header lines)
$hd{$header} .= $_, next if s/^\s+/ /;

($header,$value) = split /:\s+/, $_, 2;

# change Content-type to CONTENT_TYPE
($header = uc $header) =~ tr/-/_/;

$hd{$header} = $value;
}

# avoid quotes in the fieldname and filename, PLEASE
$hd{CONTENT_DISPOSITION} =~
/ name=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and
$param = $+;
$hd{CONTENT_DISPOSITION} =~
/ filename=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and
$filename = $+;

# some versions of MSIE do this automatically :(
$skip = 1 if $strip and (
$param eq "" or
(defined $filename and $filename eq "")
);

$kv{$param} = { HEADERS => \%hd } unless $skip;

# file upload not supported (yet)
next if defined $filename;

# here's the actual data
while () {
chomp;

# go to next form element
$_ eq $boundary and next FORM_DATA;

# done with form processing
$_ eq "$boundary--" and last FORM_DATA;

# if we don't care about this element
next if $skip;

if (not exists $kv{$param}{VALUE}) {
$kv{$param}{VALUE} = $_;
}
elsif (not ref $kv{$param}{VALUE}) {
$kv{$param}{VALUE} = [ $kv{$param}, $_ ];
}
else {
push @{ $kv{$param}{VALUE} }, $_;
}
}
}

return %kv;
}


You'll notice I go through great lengths to find the name and filename fields in the header. Let's examine this regular expression:

(?:
" Match a "
( Save to $1
[^\\"]* 0 or more non-\ and non-" chars
(?:
\\. \ followed by a char
[^\\"]* 0 or more non-\ and non-" chars
)* This group 0 or more times
) End of $1
" Followed by a "
| OR...
( Save to $2
[^\s;]* 0 or more non-whitespace and non-; chars
) End of $2
)

We store $+ to the variable, which is the last parenthesized pattern matched, so it's a faster way of saying:

$param = defined $1 ? $1 : $2;

This regex will break if the field name or filename has a " in it, but common sense should dictate that you don't do that anyway.{mospagebreak title=A File-Upload POST Query&toc=1} CONTENT_LENGTH => 166
CONTENT_TYPE => multipart/form-data; boundary=xyzzy

--xyzzy
Content-Disposition: form-data; name="to_save"; filename="c:\me.html"
Content-Type: text/html

<html>
<body>
Not Much Here
</body>
</html>
--xyzzy--


This is not complex, really, but it involves a bit more care than the normal fields. The main point is "what do we do with the file that is uploaded?" I suggest we make a temporary file, put the data in it, and then return the name of the temporary file. This is only for the first draft of this function. I will show you later how to turn this into a more stylish solution.

One potential problem is the cleaning up of the temporary files when the program ends. We'll see how to do that later.

# file_upload(\%data, $filename, $boundary)

sub file_upload {
my ($data,$file,$stop) = @_;

# XXX: Unix-specific -- this needs fixing
my $tmpfile = "/tmp/CGI-temp-$$-" . time;

$data->{NAME} = $file;

open TMPFILE, "> $tmpfile" or warn("can't save to $tmpfile: $!"), return;
binmode TMPFILE if $^O =~ /WIN|VMS|OS2/i;
while () {
last if $_ eq "$stop$/" or $_ eq "$stop--$/";
print TMPFILE;
}
close TMPFILE;

$data->{FILENAME} = $tmpfile;
}


 
 
>>> More Perl Programming Articles          >>> More By Jeff Pinyan
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: