Perl
  Home arrow Perl arrow Page 3 - Parsing a Querystring With Perl
FaxWave - Free Trial.
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PERL

Parsing a Querystring With Perl
By: Jeff Pinyan
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 3 stars3 stars3 stars3 stars3 stars / 6
    2002-12-18

    Table of Contents:
  • Parsing a Querystring With Perl
  • A Simple ISINDEX Query
  • A Simple POST Query

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Dell PowerEdge Servers

    Parsing a Querystring With Perl - A Simple POST Query
    (Page 3 of 3 )

    Content-type: application/x-www-form-urlencoded
    Content-length: 40

    name=Jeff+Pinyan&email=japhy%40pobox.com


    Decoding an a/xwfu POST query is much like the GET method, except that we must first read CONTENT_LENGTH bytes from standard input. That acts as our pseudo query string.

    It is important to know that a POST query can be made at the same time as a GET query, by placing data in the query string. This is valid, and deserves to be decoded.

    # %kv_pairs = simple_post_query($squash, $strip);

    sub simple_post_query {
    my ($squash,$strip) = @_;
    read STDIN, my($str), $ENV{CONTENT_LENGTH};
    my %kv;

    # ; to & translation
    $str =~ tr/;/&/;

    # & squishing
    $str =~ tr/&//s if $squash;

    # leading/trailing & removal
    $str =~ s/^&+//, $str =~ s/&+$// if $strip;

    # for each k=v pair
    for (split /&/, $str) {
    # third arg of '2' because $_ might be 'a=b=c'
    my ($k,$v) = split /=/, $_, 2;

    # don't allow for blank key
    next if $k eq "";

    ($k,$v) = map url_decode, ($k,$v);

    if (not exists $kv{$k}) { $kv{$k} = $v }
    elsif (not ref $kv{$k}) { $kv{$k} = [ $kv{$k}, $v ] }
    else { push @{ $kv{$k} }, $v }
    }

    return %kv;
    }


    See? Hardly nothing new, except for a call to read() and the use of the $ENV{CONTENT_LENGTH} variable. I prefixed the function with simple_, because the m/fd query parser is going to be much more involving.

    A Complex POST Query

    CONTENT_LENGTH => 344
    CONTENT_TYPE => multipart/form-data; boundary=5154532515249

    --5154532515249
    Content-Disposition: form-data; name="feature"

    123
    --5154532515249
    Content-Disposition: form-data; name="comment"

    what's up?
    --5154532515249
    Content-Disposition: form-data; name="foobar"

    456
    --5154532515249--


    The real difficulty comes in parsing an m/fd POST query, because different browsers behave in different ways. This is A Bad Thing, since specifications are supposed to be adhered to. But we will have to make do, and get around special cases whenever possible. Some of the discrepancies are:
    • MSIE 3.01 and 3.02 on the Macintosh don't use two leading hyphens in the boundary string
    • Many browsers don't escape " characters in the field names or filenames, which creates quite a problem when trying to get the value between quotes
    • Certain browsers (such as MSIE 5.0 for Win32) automatically remove fields with no name (like <input type="text" name="">)
    As for our quoting problem, no work-around seems to exist. Imagine the following HTML tag:

    <input type="text" name='something"; filename="foo.bar'>

    A m/fd POST query would contain

    Content-Disposition: form-data; name="something"; filename="foo.bar"

    We just fooled a query parser into thinking that there's a file upload present. That's not very nice of us, but then, the browser isn't too smart. Sadly, this has no workaround. We can only hope that browsers start to escape their quotes -- and we will make a pseudo-browser that does this, later. For now, let's look at the code:

    # %kv_pairs = complex_post_query($strip)

    sub complex_post_query {
    my ($strip) = @_;
    my ($CRLF,$boundary,%kv);

    # different OSs define \r and \n differently
    # so adjust to make sure we get the line-ending right
    $CRLF = $^O =~ /VMS/i ? "\n" : # VMS
    "\t" ne "\011" ? "\r\n" : # EBCDIC (non-ASCII)
    "\015\012"; # others

    # for reading from STDIN
    local $/ = $CRLF;

    # for reading binary data on sensitive OSs
    binmode STDIN if $^O =~ /^(?:WIN|VMS|OS2)/i;

    # Mac MSIE 3.01/3.02 doesn't put '--' at
    # the beginning of the boundary string
    # (so says Lincoln Stein in CGI.pm)
    chomp($boundary = );
    $boundary =~ s/^--// if
    $ENV{HTTP_USER_AGENT} =~ /MSIE\s+3\.0[12];\s*Mac/;

    FORM_DATA:
    while (1) {
    my (%hd,$header,$value,$param,$filename,$skip);

    # parse headers
    while () {
    chomp;
    last if /^$/;

    # header continutation (see RFC 822 3.4.8
    # on wrapping long header lines)
    $hd{$header} .= $_, next if s/^\s+/ /;

    ($header,$value) = split /:\s+/, $_, 2;

    # change Content-type to CONTENT_TYPE
    ($header = uc $header) =~ tr/-/_/;

    $hd{$header} = $value;
    }

    # avoid quotes in the fieldname and filename, PLEASE
    $hd{CONTENT_DISPOSITION} =~
    / name=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and
    $param = $+;
    $hd{CONTENT_DISPOSITION} =~
    / filename=(?:"([^\\"]*(?:\\.[^\\"]*)*)"|([^\s;]*))/i and
    $filename = $+;

    # some versions of MSIE do this automatically :(
    $skip = 1 if $strip and (
    $param eq "" or
    (defined $filename and $filename eq "")
    );

    $kv{$param} = { HEADERS => \%hd } unless $skip;

    # file upload not supported (yet)
    next if defined $filename;

    # here's the actual data
    while () {
    chomp;

    # go to next form element
    $_ eq $boundary and next FORM_DATA;

    # done with form processing
    $_ eq "$boundary--" and last FORM_DATA;

    # if we don't care about this element
    next if $skip;

    if (not exists $kv{$param}{VALUE}) {
    $kv{$param}{VALUE} = $_;
    }
    elsif (not ref $kv{$param}{VALUE}) {
    $kv{$param}{VALUE} = [ $kv{$param}, $_ ];
    }
    else {
    push @{ $kv{$param}{VALUE} }, $_;
    }
    }
    }

    return %kv;
    }


    You'll notice I go through great lengths to find the name and filename fields in the header. Let's examine this regular expression:

    (?:
    " Match a "
    ( Save to $1
    [^\\"]* 0 or more non-\ and non-" chars
    (?:
    \\. \ followed by a char
    [^\\"]* 0 or more non-\ and non-" chars
    )* This group 0 or more times
    ) End of $1
    " Followed by a "
    | OR...
    ( Save to $2
    [^\s;]* 0 or more non-whitespace and non-; chars
    ) End of $2
    )

    We store $+ to the variable, which is the last parenthesized pattern matched, so it's a faster way of saying:

    $param = defined $1 ? $1 : $2;

    This regex will break if the field name or filename has a " in it, but common sense should dictate that you don't do that anyway.{mospagebreak title=A File-Upload POST Query&toc=1} CONTENT_LENGTH => 166
    CONTENT_TYPE => multipart/form-data; boundary=xyzzy

    --xyzzy
    Content-Disposition: form-data; name="to_save"; filename="c:\me.html"
    Content-Type: text/html

    <html>
    <body>
    Not Much Here
    </body>
    </html>
    --xyzzy--


    This is not complex, really, but it involves a bit more care than the normal fields. The main point is "what do we do with the file that is uploaded?" I suggest we make a temporary file, put the data in it, and then return the name of the temporary file. This is only for the first draft of this function. I will show you later how to turn this into a more stylish solution.

    One potential problem is the cleaning up of the temporary files when the program ends. We'll see how to do that later.

    # file_upload(\%data, $filename, $boundary)

    sub file_upload {
    my ($data,$file,$stop) = @_;

    # XXX: Unix-specific -- this needs fixing
    my $tmpfile = "/tmp/CGI-temp-$$-" . time;

    $data->{NAME} = $file;

    open TMPFILE, "> $tmpfile" or warn("can't save to $tmpfile: $!"), return;
    binmode TMPFILE if $^O =~ /WIN|VMS|OS2/i;
    while () {
    last if $_ eq "$stop$/" or $_ eq "$stop--$/";
    print TMPFILE;
    }
    close TMPFILE;

    $data->{FILENAME} = $tmpfile;
    }

    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

       

    PERL ARTICLES

    - Perl: A Continuing Look at Hashes and Multid...
    - Perl: Another Round with Hashes
    - Perl Hashes
    - Perl Lists: A Final Look at List::Util
    - Perl Lists: Utilizing List::Util
    - Perl Lists: The Split() Function
    - SQL and CGI with Perl and DBI
    - Perl Lists: More Functions and Operators
    - SELECT Queries and Perl
    - Perl Lists: More on Manipulation
    - Creating a Database with Perl and DBI
    - Perl: Sailing the List(less) Seas
    - Perl and DBI
    - Perl: Concatenating Text and More
    - Perl Text: Quoting Without Quote Marks




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway