Perl
  Home arrow Perl arrow Page 2 - Parsing a Querystring With Perl
FaxWave - Free Trial.
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
IBM Developerworks
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PERL

Parsing a Querystring With Perl
By: Jeff Pinyan
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 3 stars3 stars3 stars3 stars3 stars / 6
    2002-12-18

    Table of Contents:
  • Parsing a Querystring With Perl
  • A Simple ISINDEX Query
  • A Simple POST Query

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Route your faxes to your email inbox. Private, secure fax numbers available from CallWave. Choose your fax number.

    Parsing a Querystring With Perl - A Simple ISINDEX Query
    (Page 2 of 3 )

    http://www.server.com/cgi-bin/prog?key1+key2+key3

    We already know the arguments are stored in the @ARGV array, so there is technically no need to parse the $ENV{QUERY_STRING} variable. However, the parsing here will be the simplest we encounter.

    I mentioned that some servers (such as Apache 1.3.1 for Win32 systems) will allow space characters (character 32) as well as ampersands (character 38) in between keywords. For this functionality, send a true value (1, for example) as the first argument to the function. Also, your system may expect just a single space between keywords; to allow for multiple spaces (and ampersands, if that option is true), send a true value as the second argument to the function. The final option causes leading and trailing spaces to be removed, and this is achieved by sending a true value as the third argument.

    This function also makes adjustments for an ISMAPed image that sends XX,YY as the query string.

    While the @ARGV array has shell characters escaped, this function merely decodes the query, and does not escape characters.

    # @keywords = isindex_query($amp, $squash, $strip);

    sub isindex_query {
    my ($amp,$squash,$strip) = @_;
    my $str = $ENV{QUERY_STRING};
    my @kw;

    # handle XX,YY
    if ($str =~ /^(\d+),(\d+)$/) {
    return ($1,$2);
    }

    # change %26 (encoding for ampersand) to a + character
    $str =~ s/%26/+/g if $amp;

    # squish more than one + into one
    $str =~ tr/+//s if $squash;

    # remove leading and trailing + signs
    $str =~ s/^\++//, $str =~ s/\++$// if $strip;

    # split query string by + signs
    @kw = split /\+/, $str;

    # return decoded keywords
    return map url_decode, @kw;
    }


    We should define the url_decode() and url_encode() functions right now, too, since they will be used over and over.

    # $decoded = url_decode($string);
    # $decoded = url_decode;

    sub url_decode {
    # default argument is $_
    local $_ = @_ ? shift : $_;
    defined or return;

    # change + signs to spaces
    tr/+/ /;

    # change hex escapes to the proper characters
    s/%([a-fA-F0-9]{2})/pack "H2", $1/eg;

    return $_;
    }


    The URLEncode Routine:

    # $encoded = url_encode($string);
    # $encoded = url_encode;

    sub url_encode {
    # default argument is $_
    local $_ = @_ ? shift : $_;
    defined or return;

    # change unsafe characters (except for space) to encoded value
    s/[^ a-zA-Z0-9._-!~*'()]/sprintf '%%%02X', ord($1)/eg;

    # change spaces to +
    tr/ /+/;

    return $_;
    }


    A GET query
    http://www.server.com/cgi-bin/prog?name=Jeff+Pinyan&email=japhy%40pobox.com

    For a GET query, we need to figure out how elements are separated. The simplest method is to split() on & or ; to get the pairs, and then again with = to get at the field and value.

    # %kv_pairs = get_query($squash, $strip);

    sub get_query {
    my ($squash,$strip) = @_;
    my $str = $ENV{QUERY_STRING};
    my %kv;

    # & and ; squishing
    $str =~ tr/&;/&/s if $squash;

    # leading/trailing & and ; removal
    $str =~ s/^[&;]+//, $str =~ s/[&;]+$// if $strip;

    # for each k=v pair
    for (split /[&;]/, $str) {
    # third arg of '2' because $_ might be 'a=b=c'
    my ($k,$v) = split /=/, $_, 2;

    # don't allow for blank key
    next if $k eq "";

    # XXX: this only allows one value per key!
    $kv{url_decode($k)} = url_decode($v);
    }

    return %kv;
    }


    As the comment states, this query parser does not allow for multiple values for a key, such as in the query take=box&take=candle&take=sword. There are generally two ways to get around this: make the value of the key in the hash a string of comma-separated strings (or some other character, like NUL (\0)), or an array reference to the values. But there is some difficulty in being sure you choose a character (or sequence of characters) that is not found in the data. So I suggest the array reference method:

    # %kv_pairs = get_query($squash, $strip);

    sub get_query {
    my ($squash,$strip) = @_;
    my $str = $ENV{QUERY_STRING};
    my %kv;

    # ; to & translation
    $str =~ tr/;/&/;

    # & squishing
    $str =~ tr/&//s if $squash;

    # leading/trailing & removal
    $str =~ s/^&+//, $str =~ s/&+$// if $strip;

    # for each k=v pair
    for (split /&/, $str) {
    # third arg of '2' because $_ might be 'a=b=c'
    my ($k,$v) = split /=/, $_, 2;

    # don't allow for blank key
    next if $k eq "";

    ($k,$v) = map url_decode, ($k,$v);

    if (not exists $kv{$k}) { $kv{$k} = $v }
    elsif (not ref $kv{$k}) { $kv{$k} = [ $kv{$k}, $v ] }
    else { push @{ $kv{$k} }, $v }
    }

    return %kv;
    }


    If you are noticing that we need to know when to call which one of these functions, you're thinking ahead. After I show how to parse a simple POST and then a m/fd POST (not file uploads yet -- that's later), then I will show a "multiplexor" -- a function that decides which parser to call.

    You'll also notice that the GET parser excludes empty field names. This can be changed, if you like, by removing that line. The final code will have that feature as an option to the parser. Also note that if there is a pair without an = (such as "a=b&foo&c=d") then the value is undef, whereas an = with no value after it (such as "a=b&foo=&c=d") sets the value as the empty string.

    More Perl Articles
    More By Jeff Pinyan


     

       

    PERL ARTICLES

    - Perl: A Continuing Look at Hashes and Multid...
    - Perl: Another Round with Hashes
    - Perl Hashes
    - Perl Lists: A Final Look at List::Util
    - Perl Lists: Utilizing List::Util
    - Perl Lists: The Split() Function
    - SQL and CGI with Perl and DBI
    - Perl Lists: More Functions and Operators
    - SELECT Queries and Perl
    - Perl Lists: More on Manipulation
    - Creating a Database with Perl and DBI
    - Perl: Sailing the List(less) Seas
    - Perl and DBI
    - Perl: Concatenating Text and More
    - Perl Text: Quoting Without Quote Marks

     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway