Functions of Strings and Regular Expressions

In this second part of a five-part series on strings and regular expressions in PHP, you’ll learn about regular expression functions and a variety of string-specific functions. This article is excerpted from chapter nine of the book Beginning PHP and Oracle: From Novice to Professional, written by W. Jason Gilmore and Bob Bryla (Apress; ISBN: 1590597702).

PHP’s Regular Expression Functions (Perl Compatible)

PHP offers seven functions for searching strings using Perl-compatible regular expressions: preg_grep() , preg_match() , preg_match_all() , preg_quote() , preg_replace() , preg_replace_callback() , and preg_split() . These functions are introduced in the following sections.

Searching an Array

The preg_grep() function searches all elements of an array, returning an array consisting of all elements matching a certain pattern. Its prototype follows:

array preg_grep(string pattern, array input [, flags])

Consider an example that uses this function to search an array for foods beginning with p :

<?php
   
$foods = array("pasta", "steak", "fish", "potatoes");
   
$food = preg_grep("/^p/", $foods);
   
print_r($food);
?>

This returns the following:

——————————————–
Array ( [0] => pasta [3] => potatoes )

——————————————–

Note that the array corresponds to the indexed order of the input array. If the value at that index position matches, it’s included in the corresponding position of the output array. Otherwise, that position is empty. If you want to remove those instances of the array that are blank, filter the output array through the function array_values() , introduced in Chapter 5.

The optional input parameter flags was added in PHP version 4.3. It accepts one value, PREG_GREP_INVERT . Passing this flag will result in retrieval of those array elements that do not match the pattern.

Searching for a Pattern

The preg_match() function searches a string for a specific pattern, returning TRUE if it exists, and FALSE otherwise. Its prototype follows:

int preg_match(string pattern, string string [, array matches]
               [, int flags [, int offset]]])

The optional input parameter pattern_array can contain various sections of the subpatterns contained in the search pattern, if applicable. Here’s an example that uses preg_match() to perform a case-insensitive search:

<?php
   
$line = "vim is the greatest word processor ever created!";
   
if (preg_match("/bVimb/i", $line, $match)) print "Match found!";
?>

For instance, this script will confirm a match if the word Vim or vim is located, but not simplevim , vims , or evim .

Matching All Occurrences of a Pattern

The preg_match_all() function matches all occurrences of a pattern in a string, assigning each occurrence to an array in the order you specify via an optional input parameter. Its prototype follows:

int preg_match_all(string pattern, string string, array pattern_array
                  
[, int order])

The order parameter accepts two values:

  1. PREG_PATTERN_ORDER is the default if the optional order parameter is not included. PREG_PATTERN_ORDER specifies the order in the way that you might think most logical: $pattern_array[0] is an array of all complete pattern matches, $pattern_array[1] is an array of all strings matching the first parenthesized regular expression, and so on.
  2. PREG_SET_ORDER orders the array a bit differently than the default setting. $pattern_array[0] contains elements matched by the first parenthesized regular expression, $pattern_array[1] contains elements matched by the second parenthesized regular expression, and so on.

Here’s how you would use preg_match_all() to find all strings enclosed in bold HTML tags:

<?php
    $userinfo = "Name: <b>Zeev Suraski</b> <br> Title: <b>PHP Guru</b>";
    preg_match_all("/<b>(.*)</b>/U", $userinfo, $pat_array);
    printf("%s <br /> %s", $pat_array[0][0], $pat_array[0][1]);
?>

This returns the following:

——————————————–
Zeev Suraski
PHP Guru

——————————————–

{mospagebreak title=Delimiting Special Regular Expression Characters}

The function preg_quote() inserts a backslash delimiter before every character of special significance to regular expression syntax. These special characters include $^*( ) +={ } [ ] | \ : < > . Its prototype follows:

string preg_quote(string str [, string delimiter])

The optional parameter delimiter specifies what delimiter is used for the regular expression, causing it to also be escaped by a backslash. Consider an example:

<?php
   
$text = "Tickets for the bout are going for $500.";
   
echo preg_quote($text);
?>

This returns the following:

——————————————–
Tickets for the bout are going for $500.
——————————————–

Replacing All Occurrences of a Pattern

The preg_replace() function operates identically to ereg_replace() , except that it uses a Perl-based regular expression syntax, replacing all occurrences of pattern with replacement , and returning the modified result. Its prototype follows:

mixed preg_replace(mixed pattern, mixed replacement, mixed str [, int limit])

The optional input parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences. Consider an example:

<?php
   
$text = "This is a link to http://www.wjgilmore.com/.";
   
echo preg_replace("/http://(.*)//",
"<a href="${0}">${0}</a>", $text);
?>

This returns the following:

——————————————–
This is a link to
<a href="http://www.wjgilmore.com/">http:// www.wjgilmore.com/</a>.

——————————————–

Interestingly, the pattern and replacement input parameters can also be arrays. This function will cycle through each element of each array, making replacements as they are found. Consider this example, which could be marketed as a corporate report filter:

<?php
    $draft = "In 2007 the company faced plummeting revenues and scandal.";
    $keywords = array("/faced/", "/plummeting/", "/scandal/");
    $replacements = array("celebrated", "skyrocketing", "expansion");
    echo preg_replace($keywords, $replacements, $draft);
?>

This returns the following:

——————————————–
In 2007 the company celebrated skyrocketing revenues and expansion.

——————————————–

Creating a Custom Replacement Function

In some situations you might wish to replace strings based on a somewhat more complex set of criteria beyond what is provided by PHP’s default capabilities. For instance, consider a situation where you want to scan some text for acronyms such as IRS and insert the complete name directly following the acronym. To do so, you need to create a custom function and then use the function preg_replace_callback() to temporarily tie it into the language. Its prototype follows:

mixed preg_replace_callback(mixed pattern, callback callback, mixed str
                            [, int limit])

The pattern parameter determines what you’re looking for, while the str parameter defines the string you’re searching. The callback parameter defines the name of the function to be used for the replacement task. The optional parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences. In the following example, a function named acronym() is passed into preg_replace_callback() and is used to insert the long form of various acronyms into the target string:

<?php

    // This function will add the acronym’s long form
    // directly after any acronyms found in $matches
    function acronym($matches) {
       
$acronyms = array(
            ‘WWW’ => ‘World Wide Web’,
            ‘IRS’ => ‘Internal Revenue Service’,
            ‘PDF’ => ‘Portable Document Format’);

        if (isset($acronyms[$matches[1]]))
            return $matches[1] . " (" . $acronyms[$matches[1]] . ")";
        else
            return $matches[1];
    }

    // The target text
    $text = "The <acronym>IRS</acronym> offers tax forms in
             <acronym>PDF</acronym> format on the <acronym>WWW</acronym> .";

    // Add the acronyms’ long forms to the target text
    $newtext = preg_replace_callback("/<acronym>(.*)</acronym>/U", ‘acronym’,
                                  $text);

    print_r($newtext);

?>

This returns the following:

——————————————–
The IRS (Internal Revenue Service) offers tax forms
in PDF (Portable Document Format) on the WWW (World Wide Web).
——————————————–

Splitting a String into Various Elements Based on a Case-Insensitive Pattern

The preg_split() function operates exactly like split() , except that pattern can also be defined in terms of a regular expression. Its prototype follows:

array preg_split(string pattern, string string [, int limit [, int flags]])

If the optional input parameter limit is specified, only limit number of substrings are returned. Consider an example:

<?php
    $delimitedText = "Jason+++Gilmore+++++++++++Columbus+++OH";
    $fields = preg_split("/+{1,}/", $delimitedText);
    foreach($fields as $field) echo $field."<br />";
?>

This returns the following:

——————————————–
Jason
Gilmore
Columbus
OH

——————————————–


Note  Later in this chapter, the section titled “Alternatives for Regular Expression Functions” offers several standard functions that can be used in lieu of regular expressions for certain tasks. In many cases, these alternative functions actually perform much faster than their regular expression counterparts.


{mospagebreak title=Other String-Specific Functions}

In addition to the regular expression–based functions discussed in the first half of this chapter, PHP offers more than 100 functions collectively capable of manipulating practically every imaginable aspect of a string. To introduce each function would be out of the scope of this book and would only repeat much of the information in the PHP documentation. This section is devoted to a categorical FAQ of sorts, focusing upon the string-related issues that seem to most frequently appear within community forums. The section is divided into the following topics:

  1. Determining string length 
     
  2. Comparing string length 
     
  3. Manipulating string case 
     
  4. Converting strings to and from HTML 
     
  5. Alternatives for regular expression functions 
     
  6. Padding and stripping a string 
     
  7. Counting characters and words

Determining the Length of a String

Determining string length is a repeated action within countless applications. The PHP function strlen() accomplishes this task quite nicely. This function returns the length of a string, where each character in the string is equivalent to one unit. Its prototype follows:

int strlen(string str)

The following example verifies whether a user password is of acceptable length:

<?php
   
$pswd = "secretpswd";
   
if (strlen($pswd) < 10)
        echo "Password is too short!";
    else
        echo "Password is valid!";
?>

In this case, the error message will not appear because the chosen password consists of ten characters, whereas the conditional expression validates whether the target string consists of less than ten characters.

Comparing Two Strings

String comparison is arguably one of the most important features of the string-handling capabilities of any language. Although there are many ways in which two strings can be compared for equality, PHP provides four functions for performing this task: strcmp(), strcasecmp() , strspn() , and strcspn() . These functions are discussed in the following sections.

Comparing Two Strings Case Sensitively

The strcmp() function performs a binary-safe, case-sensitive comparison of two strings. Its prototype follows:

int strcmp(string str1, string str2)

It will return one of three possible values based on the comparison outcome:

  1. 0 if str1 and str2 are equal
  2. -1 if str1 is less than str 2
  3. 1 if str2 is less than str1

Web sites often require a registering user to enter and then confirm a password, lessening the possibility of an incorrectly entered password as a result of a typing error. strcmp() is a great function for comparing the two password entries because passwords are often case sensitive:

<?php
   
$pswd = "supersecret";
   
$pswd2 = "supersecret2";

    if (strcmp($pswd,$pswd2) != 0)
        echo "Passwords do not match!";
    else
        echo "Passwords match!";
?>

Note that the strings must match exactly for strcmp() to consider them equal. For example, Supersecret is different from supersecret . If you’re looking to compare two strings case insensitively, consider strcasecmp() , introduced next.

Another common point of confusion regarding this function surrounds its behavior of returning 0 if the two strings are equal. This is different from executing a string comparison using the == operator, like so:

if ($str1 == $str2)

While both accomplish the same goal, which is to compare two strings, keep in mind that the values they return in doing so are different.

Comparing Two Strings Case Insensitively

The strcasecmp() function operates exactly like strcmp() , except that its comparison is case insensitive. Its prototype follows:

int strcasecmp(string str1, string str2)

The following example compares two e-mail addresses, an ideal use for strcasecmp() because case does not determine an e-mail address’s uniqueness:

<?php
   
$email1 = admin@example.com;
    $email2 = "ADMIN@example.com";

    if (! strcasecmp($email1, $email2))
        echo "The email addresses are identical!";
?>

In this example, the message is output because strcasecmp() performs a case-insensitive comparison of $email1 and $email2 and determines that they are indeed identical.

{mospagebreak title=Calculating the Similarity Between Two Strings}

The strspn() function returns the length of the first segment in a string containing characters also found in another string. Its prototype follows:

int strspn(string str1, string str2)

Here’s how you might use strspn() to ensure that a password does not consist solely of numbers:

<?php
   
$password = "3312345";
   
if (strspn($password, "1234567890") == strlen($password))
       
echo "The password cannot consist solely of numbers!";
?>

In this case, the error message is returned because $password does indeed consist solely of digits.

Calculating the Difference Between Two Strings

The strcspn() function returns the length of the first segment of a string containing characters not found in another string. Its prototype follows:

int strcspn(string str1, string str2)

Here’s an example of password validation using strcspn() :

<?php
   
$password = "a12345";
   
if (strcspn($password, "1234567890") == 0) {
       
echo "Password cannot consist solely of numbers!";
    }
?>

In this case, the error message will not be displayed because $password does not consist solely of numbers.

Manipulating String Case

Four functions are available to aid you in manipulating the case of characters in a string: strtolower(), strtoupper(), ucfirst(), and ucwords() . These functions are discussed in this section.

Converting a String to All Lowercase

The strtolower() function converts a string to all lowercase letters, returning the modified string. Nonalphabetical characters are not affected. Its prototype follows:

string strtolower(string str)

The following example uses strtolower() to convert a URL to all lowercase letters:

<?php
    $url = http://WWW.EXAMPLE.COM/;
    echo strtolower($url);
?>

This returns the following:

——————————————–
http://www.example.com/

——————————————–

Converting a String to All Uppercase

Just as you can convert a string to lowercase, you can convert it to uppercase. This is accomplished with the function strtoupper() . Its prototype follows:

string strtoupper(string str)

Nonalphabetical characters are not affected. This example uses strtoupper() to convert a string to all uppercase letters:

<?php
    $msg = "I annoy people by capitalizing e-mail text.";
    echo strtoupper($msg);
?>

This returns the following:

——————————————–
I ANNOY PEOPLE BY CAPITALIZING E-MAIL TEXT.

——————————————–

Capitalizing the First Letter of a String

The ucfirst() function capitalizes the first letter of the string str , if it is alphabetical. Its prototype follows:

string ucfirst(string str)

Nonalphabetical characters will not be affected. Additionally, any capitalized characters found in the string will be left untouched. Consider this example:

<?php
    $sentence = "the newest version of PHP was released today!";
    echo ucfirst($sentence);
?>

This returns the following:

——————————————–
The newest version of PHP was released today!
——————————————–

Note that while the first letter is indeed capitalized, the capitalized word PHP was left untouched.

Capitalizing Each Word in a String

The ucwords() function capitalizes the first letter of each word in a string. Its prototype follows:

string ucwords(string str)

Nonalphabetical characters are not affected. This example uses ucwords() to capitalize each word in a string:

<?php
   
$title = "O’Malley wins the heavyweight championship!";
   
echo ucwords($title);
?>

This returns the following:

——————————————–
O’Malley Wins The Heavyweight Championship!
——————————————–

Note that if O’Malley was accidentally written as O’malley, ucwords() would not catch the error, as it considers a word to be defined as a string of characters separated from other entities in the string by a blank space on each side.

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan