Administration
  Home arrow Administration arrow So What's A $#!%% Regular Expression, Anyway?!
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
Google.com  
ADMINISTRATION

So What's A $#!%% Regular Expression, Anyway?!
By: Vikram Vaswani and Harish Kamath, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 5
    2000-04-12


    Table of Contents:
  • So What's A $#!%% Regular Expression, Anyway?!
  • Ranging Far And Wide...
  • How To Say "Ummmm...." In Three Different Languages

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    So What's A $#!%% Regular Expression, Anyway?!
    ( Page 1 of 3 )

    Regular expressions are one of the most powerful tools in the arsenal of any *NIX programmer. This article offers insights into what they are, how to go about constructing them, and how to add them to your Perl, PHP and JavaScript programs.Ask any relatively-experienced *NIX user to list his top ten favorite things about the operating system, and you're almost certain to hear him mutter, somewhere between "99% uptime" and "remote system reboots", the phrase "regular expressions".

    Ask any relatively-experienced *NIX user to list the ten things he hates most about the operating system, and somewhere between "zombie processes" and "installation", he's sure to spit out the phrase "regular expressions".{mospagebreak title=And First There Was Love...} Regular expressions, also known as "regex" by the geek community, are a powerful tool used in pattern-matching and substitution. They are commonly associated with almost all *NIX-based tools, including editors like vi, scripting languages like Perl and PHP, and shell programs like awk and sed. You'll even find them in client-side scripting languages like JavaScript - kinda like Madonna, their popularity cuts across languages and territorial boundaries...

    A regular expression lets you build patterns using a set of special characters; these patterns can then be compared with text in a file, data entered into an application, or input from a form filled up by users on a Web site. Depending on whether or not there's a match, appropriate action can be taken, and appropriate program code executed.

    For example, one of the most common applications of regular expressions is to check whether or not a user's email address, as entered into an online form, is in the correct format; if it is, the form is processed, whereas if it's not, a warning message pops up asking the user to correct the error. Regular expressions thus play an important role in the decision-making routines of Web applications - although, as you'll see, they can also be used to great effect in complex find-and-replace operations.

    A regular expression usually looks something like this:


    /love/
    All this does is match the pattern "love" in the text it's applied to. Like many other things in life, it's simpler to get your mind around the pattern than the concept - but then, that's neither here nor there...

    How about something a little more complex? Try this:

    /fo+/
    This would match the words "fool", "footsie" and "four-seater". And although it's a pretty silly example, you have to admit that there's truth to it - after all, who but fools in love would play footsie in a four-seater?

    The "+" that you see above is the first of what are called "meta-characters" - these are characters that have a special meaning when used within a pattern. The "+" metacharacter is used to match one or more occurrence of the preceding character - in the example above, the letter "f" followed by one or more occurrence of the letter "o".

    Similar to the "+" meta-character, we have "*" and "?" - these are used to match zero or more occurrences of the preceding character, and zero or one occurrence of the preceding character, respectively. So,

    /eg*/
    would match "easy", "egocentric" and "egg"

    while

    /Wil?/
    would match "Winnie", "Wimpy" "Wilson" and "William", though not "Wendy" or "Wolf".

    In case all this seems a little too imprecise, you can also specify a range for the number of matches. For example, the regular expression

    /jim{2,6}/
    would match "jimmy" and "jimmmmmy!", but not "jim". The numbers in the curly braces represent the lower and upper values of the range to match; you can leave out the upper limit for an open-ended range match.{mospagebreak title=Of Carrots, Bombshells And Four-Figure Incomes} Now that you've got the basics down, how about taking it to the next level? It's also possible to search for white space, numbers and alphabetic characters with a regular expression - and here's the merry gang of meta-characters that will help you do just that:

    s = used to match a single white space character, including tabs and newline characters

    S = used to match everything that is *not* a white space character

    d = used to match numbers from 0 to 9

    w = used to match letters, numbers and underscores

    W = used to match anything that does not match with w

    . = used to match everything except the newline character

    Now, you're probably thinking, "Hey, that's great - but what does it all mean?!". Well, suppose you wanted to find all the white space in a document...







    /s+/
    Easy, isn't it? If you're looking only for numbers, try

    /d/
    So, if you had a complex financial spreadsheet in front of you, and you wanted to quickly find all amounts of a thousand dollars or more, you could use

    /d000/
    How about limiting your search to the beginning or end of a string? Well, that's why we have "pattern anchors" - these simply tie your regular expression to either the first or last character of the string, and come in very useful when you're looking for a way to filter through a mass of matches.

    There are two basic pattern anchors - the first one is represented by a caret [^], and is used to indicate that the expression should be matched only at the beginning of the string that it is applied to. For example, the expression

    /^hell/
    will return a match only if it finds a word beginning with "hell" - "hello" and "hellhound", but not "shell".

    And similarly, to match the end of a string, there's the "$" pattern anchor. So

    /ar$/
    would match "scar", "car" and "bar", though not "art", "army" or "arrow".

     
     
    >>> More Administration Articles          >>> More By Vikram Vaswani and Harish Kamath, (c) Melonfire
     

       

    ADMINISTRATION ARTICLES

    - Network Booting via PXE: the Basics
    - Scalix: Linux Administrator`s Guide
    - Network Administration with FreeBSD 7
    - Components of an Information Architecture
    - The Anatomy of an Information Architecture
    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek