Home arrow PHP arrow Page 4 - Strings and Regular Expressions

Regular Expression Syntax (Perl) - PHP

Strings and regular expressions are among the basic tools that help programmers get their jobs done. This five-part article series covers how these are used in PHP. It is excerpted from chapter nine of the book Beginning PHP and Oracle: From Novice to Professional, written by W. Jason Gilmore and Bob Bryla (Apress; ISBN: 1590597702).

TABLE OF CONTENTS:
  1. Strings and Regular Expressions
  2. Regular Expression Syntax (POSIX)
  3. PHP’s Regular Expression Functions (POSIX Extended)
  4. Regular Expression Syntax (Perl)
By: Apress Publishing
Rating: starstarstarstarstar / 1
June 17, 2010

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
 

Perl has long been considered one of the most powerful parsing languages ever written, and it provides a comprehensive regular expression language that can be used to search and replace even the most complicated of string patterns. The developers of PHP felt that instead of reinventing the regular expression wheel, so to speak, they should make the famed Perl regular expression syntax available to PHP users.

Perl’s regular expression syntax is actually a derivation of the POSIX implementation, resulting in considerable similarities between the two. You can use any of the quantifiers introduced in the previous POSIX section. The remainder of this section is devoted to a brief introduction of Perl regular expression syntax. Let’s start with a simple example of a Perl-based regular expression:

/food/

Notice that the stringfoodis enclosed between two forward slashes. Just as with POSIX regular expressions, you can build a more complex string through the use of quantifiers:

/fo+/

This will matchfofollowed by one or more characters. Some potential matches includefood,fool, andfo4. Here is another example of using a quantifier:

/fo{2,4}/

This matchesffollowed by two to four occurrences ofo. Some potential matches includefool,fooool, andfoosball.

Modifiers

Often you’ll want to tweak the interpretation of a regular expression; for example, you may want to tell the regular expression to execute a case-insensitive search or to ignore comments embedded within its syntax. These tweaks are known as modifiers, and they go a long way toward helping you to write short and concise expressions. A few of the more interesting modifiers are outlined in Table 9-1.

Table 9-1. Six Sample Modifiers

Modifier

Description

i

Perform a case-insensitive search.

g

Find all occurrences (perform a global search).

m

Treat a string as several (mfor multiple) lines. By default, the ^and $characters match at the very start and very end of the string in question. Using the mmodifier will allow for ^and $to match at the beginning of any line in a string.

s

Treat a string as a single line, ignoring any newline characters found within; this accomplishes just the opposite of the mmodifier.

x

Ignore white space and comments within the regular expression.

U

Stop at the first match. Many quantifiers are "greedy"; they match the pattern as many times as possible rather than just stop at the first match. You can cause them to be "ungreedy" with this modifier.

These modifiers are placed directly after the regular expression—for instance,/string/i. Let’s consider a few examples:

/wmd/i: MatchesWMD,wMD,WMd,wmd, and any other case variation of the stringwmd.

/taxation/gi: Locates all occurrences of the word taxation. You might use the global modifier to tally up the total number of occurrences, or use it in conjunction with a replacement feature to replace all occurrences with some other string.

Metacharacters

Perl regular expressions also employ metacharacters to further filter their searches. A metacharacter is simply an alphabetical character preceded by a backslash that symbolizes special meaning. A list of useful metacharacters follows:

\A: Matches only at the beginning of the string.

\b: Matches a word boundary.

\B: Matches anything but a word boundary.

\d: Matches a digit character. This is the same as
[0-9].

\D: Matches a nondigit character.

\s: Matches a whitespace character.

\S: Matches a nonwhitespace character.

[]: Encloses a character class.

(): Encloses a character grouping or defines a back reference.

$: Matches the end of a line.

^: Matches the beginning of a line.

.: Matches any character except for the newline.

\: Quotes the next metacharacter.

\w: Matches any string containing solely underscore and alphanumeric characters. This is the same as
[a-zA-Z0-9_].

\W: Matches a string, omitting the underscore and alphanumeric characters.

Let’s consider a few examples. The first regular expression will match strings such aspisaand lisabut notsand:

/sa\b/

The next returns the first case-insensitive occurrence of the wordlinux:

/\blinux\b/i

The opposite of the word boundary metacharacter is\B, matching on anything but a word boundary. Therefore this example will match strings such assandandSallybut notMelissa:

/sa\B/

The final example returns all instances of strings matching a dollar sign followed by one or more digits:

/\$\d+\g

Please check back next week for the continuation of this article.



 
 
>>> More PHP Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: