Home arrow PHP arrow Page 2 - Strings and Regular Expressions

Regular Expression Syntax (POSIX) - PHP

Strings and regular expressions are among the basic tools that help programmers get their jobs done. This five-part article series covers how these are used in PHP. It is excerpted from chapter nine of the book Beginning PHP and Oracle: From Novice to Professional, written by W. Jason Gilmore and Bob Bryla (Apress; ISBN: 1590597702).

TABLE OF CONTENTS:
  1. Strings and Regular Expressions
  2. Regular Expression Syntax (POSIX)
  3. PHP’s Regular Expression Functions (POSIX Extended)
  4. Regular Expression Syntax (Perl)
By: Apress Publishing
Rating: starstarstarstarstar / 1
June 17, 2010

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

The structure of a POSIX regular expression is similar to that of a typical arithmetic expression: various elements (operators) are combined to form a more complex expression. The meaning of the combined regular expression elements is what makes them so powerful. You can locate not only literal expressions, such as a specific word or number, but also a multitude of semantically different but syntactically similar strings, such as all HTML tags in a file.


Note  POSIX stands for Portable Operating System Interface for Unix, and is representative of a set of standards originally intended for Unix-based operating systems. POSIX regular expression syntax is an attempt to standardize how regular expressions are implemented in many programming languages.


The simplest regular expression is one that matches a single character, such asg, which would match strings such asgog,haggle, andbag. You could combine several letters together to form larger expressions, such asgan, which logically would match any string containinggan:gang,organize, orReagan, for example.

You can also test for several different expressions simultaneously by using the pipe (|) character. For example, you could test forphporzendvia the regular expressionphp|zend.

Before getting into PHP’s POSIX-based regular expression functions, let’s review three methods that POSIX supports for locating different character sequences: brackets, quantifiers, and predefined character ranges.

Brackets

Brackets ([]) are used to represent a list, or range, of characters to be matched. For instance, contrary to the regular expressionphp, which will locate strings containing the explicit stringphp, the regular expression[php]will find any string containing the characterp orh. Several commonly used character ranges follow:

  1. [0-9]matches any decimal digit from0through9.
  2. [a-z]matches any character from lowercaseathrough lowercasez
     
  3. [A-Z]matches any character from uppercaseA through uppercaseZ
     
  4. [A-Za-z]matches any character from uppercaseAthrough lowercasez.

Of course, the ranges shown here are general; you could also use the range[0-3]to match any decimal digit ranging from0through3, or the range[b-v]to match any lowercase character ranging frombthroughv. In short, you can specify any ASCII range you wish.

Quantifiers

Sometimes you might want to create regular expressions that look for characters based on their frequency or position. For example, you might want to look for strings containing one or more instances of the letterp, strings containing at least twop’s, or even strings with the letterpas their beginning or terminating character. You can make these demands by inserting special characters into the regular expression. Here are several examples of these characters:

  1. p+matches any string containing at least onep
     
  2. p*matches any string containing zero or more
    p’s. 
     
  3. p?matches any string containing zero or onep
     
  4. p{2}matches any string containing a sequence of twop’s. 
     
  5. p{2,3}matches any string containing a sequence of two or threep’s. 
     
  6. p{2,}matches any string containing a sequence of at least twop’s. 
     
  7. p$matches any string withpat the end of it.

Still other flags can be inserted before and within a character sequence: 
 

  1. ^pmatches any string withpat the beginning of it. 
     
  2. [^a-zA-Z]matches any string not containing any of the characters ranging fromathroughzandA throughZ
     
  3. p.pmatches any string containingp, followed by any character, in turn followed by anotherp.

You can also combine special characters to form more complex expressions. Consider the following examples:

  1. ^.{2}$matches any string containing exactly two characters. 
     
  2. <b>(.*)</b>matches any string enclosed within<b>and</b>
     
  3. p(hp)*matches any string containing apfollowed by zero or more instances of the sequencehp.

You may wish to search for these special characters in strings instead of using them in the special context just described. To do so, the characters must be escaped with a backslash (\). For example, if you want to search for a dollar amount, a plausible regular expression would be as follows:([\$])([0-9]+); that is, a dollar sign followed by one or more integers. Notice the backslash preceding the dollar sign. Potential matches of this regular expression include$42,$560and$3.

Predefined Character Ranges (Character Classes)

For reasons of convenience, several predefined character ranges, also known as character classes, are available. Character classes specify an entire range of characters—for example, the alphabet or an integer set. Standard classes include the following:

[:alpha:]: Lowercase and uppercase alphabetical characters. This can also be specified as
[A-Za-z].

[:alnum:]: Lowercase and uppercase alphabetical characters and numerical digits. This can also be specified as[A-Za-z0-9].

[:cntrl:]: Control characters such as tab, escape, or backspace.

[:digit:]: Numerical digits 0 through 9. This can also be specified as[0-9].

[:graph:]:Printable characters found in the range of ASCII 33 to 126.

[:lower:]: Lowercase alphabetical characters. This can also be specified as[a-z].

[:punct:]: Punctuation characters, including~`! @ # $ % ^&* ( )-_+={ } [ ] : ;'< > ,.? and/.

[:upper:]: Uppercase alphabetical characters. This can also be specified as[A-Z].

[:space:]: Whitespace characters, including the space, horizontal tab, vertical tab, new line, form feed, or carriage return.

[:xdigit:]: Hexadecimal characters. This can also be specified as[a-fA-F0-9].



 
 
>>> More PHP Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: