Perl Programming Page 3 - Parsing a Querystring With Perl |
Content-type: application/x-www-form-urlencoded Content-length: 40 name=Jeff+Pinyan&email=japhy%40pobox.com Decoding an a/xwfu POST query is much like the GET method, except that we must first read CONTENT_LENGTH bytes from standard input. That acts as our pseudo query string. It is important to know that a POST query can be made at the same time as a GET query, by placing data in the query string. This is valid, and deserves to be decoded. # %kv_pairs = simple_post_query($squash, $strip); sub simple_post_query { my ($squash,$strip) = @_; read STDIN, my($str), $ENV{CONTENT_LENGTH}; my %kv; # ; to & translation $str =~ tr/;/&/; # & squishing $str =~ tr/&//s if $squash; # leading/trailing & removal $str =~ s/^&+//, $str =~ s/&+$// if $strip; # for each k=v pair for (split /&/, $str) { # third arg of '2' because $_ might be 'a=b=c' my ($k,$v) = split /=/, $_, 2; # don't allow for blank key next if $k eq ""; ($k,$v) = map url_decode, ($k,$v); if (not exists $kv{$k}) { $kv{$k} = $v } elsif (not ref $kv{$k}) { $kv{$k} = [ $kv{$k}, $v ] } else { push @{ $kv{$k} }, $v } } return %kv; } See? Hardly nothing new, except for a call to read() and the use of the $ENV{CONTENT_LENGTH} variable. I prefixed the function with simple_, because the m/fd query parser is going to be much more involving. A Complex POST Query CONTENT_LENGTH => 344 CONTENT_TYPE => multipart/form-data; boundary=5154532515249 --5154532515249 Content-Disposition: form-data; name="feature" 123 --5154532515249 Content-Disposition: form-data; name="comment" what's up? --5154532515249 Content-Disposition: form-data; name="foobar" 456 --5154532515249-- The real difficulty comes in parsing an m/fd POST query, because different browsers behave in different ways. This is A Bad Thing, since specifications are supposed to be adhered to. But we will have to make do, and get around special cases whenever possible. Some of the discrepancies are:
<input type="text" name='something"; filename="foo.bar'> A m/fd POST query would contain Content-Disposition: form-data; name="something"; filename="foo.bar" We just fooled a query parser into thinking that there's a file upload present. That's not very nice of us, but then, the browser isn't too smart. Sadly, this has no workaround. We can only hope that browsers start to escape their quotes -- and we will make a pseudo-browser that does this, later. For now, let's look at the code: # %kv_pairs = complex_post_query($strip) sub complex_post_query { my ($strip) = @_; my ($CRLF,$boundary,%kv); # different OSs define \r and \n differently # so adjust to make sure we get the line-ending right $CRLF = $^O =~ /VMS/i ? "\n" : # VMS "\t" ne "\011" ? "\r\n" : # EBCDIC (non-ASCII) "\015\012"; # others # for reading from STDIN local $/ = $CRLF; # for reading binary data on sensitive OSs binmode STDIN if $^O =~ /^(?:WIN|VMS|OS2)/i; # Mac MSIE 3.01/3.02 doesn't put '--' at # the beginning of the boundary string # (so says Lincoln Stein in CGI.pm) chomp($boundary = ); $boundary =~ s/^--// if $ENV{HTTP_USER_AGENT} =~ /MSIE\s+3\.0[12];\s*Mac/; FORM_DATA: while (1) { my (%hd,$header,$value,$param,$filename,$skip); # parse headers while () { chomp; last if /^$/; # header continutation (see RFC 822 3.4.8 # on wrapping long header lines) $hd{$header} .= $_, next if s/^\s+/ /; ($header,$value) = split /:\s+/, $_, 2; # change Content-type to CONTENT_TYPE ($header = uc $header) =~ tr/-/_/; $hd{$header} = $value; } # avoid quotes in the fieldname and filename, PLEASE $hd{CONTENT_DISPOSITION} =~ / name=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and $param = $+; $hd{CONTENT_DISPOSITION} =~ / filename=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and $filename = $+; # some versions of MSIE do this automatically :( $skip = 1 if $strip and ( $param eq "" or (defined $filename and $filename eq "") ); $kv{$param} = { HEADERS => \%hd } unless $skip; # file upload not supported (yet) next if defined $filename; # here's the actual data while () { chomp; # go to next form element $_ eq $boundary and next FORM_DATA; # done with form processing $_ eq "$boundary--" and last FORM_DATA; # if we don't care about this element next if $skip; if (not exists $kv{$param}{VALUE}) { $kv{$param}{VALUE} = $_; } elsif (not ref $kv{$param}{VALUE}) { $kv{$param}{VALUE} = [ $kv{$param}, $_ ]; } else { push @{ $kv{$param}{VALUE} }, $_; } } } return %kv; } You'll notice I go through great lengths to find the name and filename fields in the header. Let's examine this regular expression: (?: " Match a " ( Save to $1 [^\\"]* 0 or more non-\ and non-" chars (?: \\. \ followed by a char [^\\"]* 0 or more non-\ and non-" chars )* This group 0 or more times ) End of $1 " Followed by a " | OR... ( Save to $2 [^\s;]* 0 or more non-whitespace and non-; chars ) End of $2 ) We store $+ to the variable, which is the last parenthesized pattern matched, so it's a faster way of saying: $param = defined $1 ? $1 : $2; This regex will break if the field name or filename has a " in it, but common sense should dictate that you don't do that anyway.{mospagebreak title=A File-Upload POST Query&toc=1} CONTENT_LENGTH => 166 CONTENT_TYPE => multipart/form-data; boundary=xyzzy --xyzzy Content-Disposition: form-data; name="to_save"; filename="c:\me.html" Content-Type: text/html <html> <body> Not Much Here </body> </html> --xyzzy-- This is not complex, really, but it involves a bit more care than the normal fields. The main point is "what do we do with the file that is uploaded?" I suggest we make a temporary file, put the data in it, and then return the name of the temporary file. This is only for the first draft of this function. I will show you later how to turn this into a more stylish solution. One potential problem is the cleaning up of the temporary files when the program ends. We'll see how to do that later. # file_upload(\%data, $filename, $boundary) sub file_upload { my ($data,$file,$stop) = @_; # XXX: Unix-specific -- this needs fixing my $tmpfile = "/tmp/CGI-temp-$$-" . time; $data->{NAME} = $file; open TMPFILE, "> $tmpfile" or warn("can't save to $tmpfile: $!"), return; binmode TMPFILE if $^O =~ /WIN|VMS|OS2/i; while () { last if $_ eq "$stop$/" or $_ eq "$stop--$/"; print TMPFILE; } close TMPFILE; $data->{FILENAME} = $tmpfile; }
blog comments powered by Disqus |
|
|
|
|
|
|
|