Strings: Trimming and Counting

In this conclusion to a five-part series on strings and regular expressions in PHP, you’ll learn about padding and stripping a string, trimming characters from a string, counting the characters in a string, and more. This article is excerpted from chapter nine of the book Beginning PHP and Oracle: From Novice to Professional, written by W. Jason Gilmore and Bob Bryla (Apress; ISBN: 1590597702).

Padding and Stripping a String

For formatting reasons, you sometimes need to modify the string length via either padding or stripping characters. PHP provides a number of functions for doing so. This section examines many of the commonly used functions.

Trimming Characters from the Beginning of a String

The ltrim() function removes various characters from the beginning of a string, including white space, the horizontal tab ( t ), newline ( n ), carriage return ( r ), NULL ( ), and vertical tab ( x0b ). Its prototype follows:

string ltrim(string str [, string charlist])

You can designate other characters for removal by defining them in the optional parameter charlist .

Trimming Characters from the End of a String

The rtrim() function operates identically to ltrim() , except that it removes the designated characters from the right side of a string. Its prototype follows:

string rtrim(string str [, string charlist])

Trimming Characters from Both Sides of a String

You can think of the trim() function as a combination of ltrim() and rtrim() , except that it removes the designated characters from both sides of a string:

string trim(string str [, string charlist])

Padding a String

The str_pad() function pads a string with a specified number of characters. Its prototype follows:

string str_pad(string str, int length [, string pad_string [, int pad_type]])

If the optional parameter pad_string is not defined, str will be padded with blank spaces; otherwise, it will be padded with the character pattern specified by pad_string . By default, the string will be padded to the right; however, the optional parameter pad_type may be assigned the values STR_PAD_RIGHT , STR_PAD_LEFT , or STR_PAD_BOTH , padding the string accordingly. This example shows how to pad a string using str_pad() :

echo str_pad("Salad", 10)." is good.";

This returns the following:

Salad    is good.

This example makes use of str_pad() ’s optional parameters:

$header = "Log Report";
echo str_pad ($header, 20, "=+", STR_PAD_BOTH);

This returns the following:

=+=+=Log Report=+=+=

Note that str_pad() truncates the pattern defined by pad_string if length is reached before completing an entire repetition of the pattern.

{mospagebreak title=Counting Characters and Words}

It’s often useful to determine the total number of characters or words in a given string. Although PHP’s considerable capabilities in string parsing has long made this task trivial, two functions were recently added that formalize the process. Both functions are introduced in this section.

Counting the Number of Characters in a String

The function count_chars() offers information regarding the characters found in a string. Its proto - type follows:

mixed count_chars(string str [, mode] )

Its behavior depends on how the optional parameter mode is defined:

0 : Returns an array consisting of each found byte value as the key and the corresponding frequency as the value, even if the frequency is zero. This is the default.

1 : Same as 0 , but returns only those byte values with a frequency greater than zero.

2 : Same as 0 , but returns only those byte values with a frequency of zero.

3 : Returns a string containing all located byte values.

4 : Returns a string containing all unused byte values .

The following example counts the frequency of each character in $sentence:

    $sentence = "The rain in Spain falls mainly on the plain";

    // Retrieve located characters and their corresponding frequency.
$chart = count_chars($sentence, 1);

    foreach($chart as $letter=>$frequency)
        echo "Character ".chr($letter)." appears $frequency times<br />";

This returns the following:

Character appears 8 times
Character S appears 1 times
Character T appears 1 times
Character a appears 5 times
Character e appears 2 times
Character f appears 1 times
Character h appears 2 times
Character i appears 5 times
Character l appears 4 times
Character m appears 1 times
Character n appears 6 times
Character o appears 1 times
Character p appears 2 times
Character r appears 1 times
Character s appears 1 times
Character t appears 1 times
Character y appears 1 times

{mospagebreak title=Counting the Total Number of Words in a String}

The function str_word_count() offers information regarding the total number of words found in a string. Its prototype follows:

mixed str_word_count(string str [, int format])

If the optional parameter format is not defined, it will simply return the total number of words. If format is defined, it modifies the function’s behavior based on its value:

1 : Returns an array consisting of all words located in str .

2 : Returns an associative array, where the key is the numerical position of the word in str , and the value is the word itself.

Consider an example:

    $summary = <<< summary
    In the latest installment of the ongoing PHP series,
    I discuss the many improvements and additions to PHP 5′s
    object-oriented architecture.
   $words = str_word_count($summary);
   printf("Total words in summary: %s", $words);

This returns the following:

Total words in summary: 23

You can use this function in conjunction with array_count_values() to determine the frequency in which each word appears within the string:

$summary = <<< summary
In the latest installment of the ongoing PHP series,
I discuss the many improvements and additions to PHP 5′s
object-oriented architecture.
$words = str_word_count($summary,2) ;
   $frequency = array_count_values($words);

This returns the following:

Array ( [In] => 1 [the] => 3 [latest] => 1 [installment] => 1 [of] => 1
[ongoing] => 1 [Developer] => 1 [com] => 1 [PHP] => 2 [series] => 1
[I] => 1 [discuss] => 1 [many] => 1 [improvements] => 1 [and] => 1
[additions] => 1 [to] => 1 [s] => 1 [object-oriented] => 1
[architecture] => 1 )

{mospagebreak title=Taking Advantage of PEAR: Validate_US}

Regardless of whether your Web application is intended for use in banking, medical, IT, retail, or some other industry, chances are that certain data elements will be commonplace. For instance, it’s conceivable you’ll be tasked with inputting and validating a telephone number or a state abbreviation, regardless of whether you’re dealing with a client, a patient, a staff member, or a customer. Such repeatability certainly presents the opportunity to create a library that is capable of handling such matters, regardless of the application. Indeed, because we’re faced with such repeatable tasks, it follows that other programmers are, too. Therefore, it’s always prudent to investigate whether somebody has already done the hard work for you and made a package available via PEAR.

Note  If you’re unfamiliar with PEAR, take some time to review Chapter 11 before continuing.

Sure enough, a quick PEAR search turns up Validate_US , a package that is capable of validating various informational items specific to the United States. Although still in beta at press time, Validate_US was already capable of syntactically validating phone numbers, SSNs, state abbreviations, and ZIP codes. This section shows you how to install and implement this immensely useful package.

Installing Validate_US

To take advantage of Validate_US, you need to install it. The process for doing so follows:

%>pear install -f Validate_US
WARNING: failed to download within preferred state "stable", will instead download version 0.5.2, stability "beta"
downloading Validate_US-0.5.2.tgz … Starting to download Validate_US-0.5.2.tgz (6,578 bytes)
…..done: 6,578 bytes
install ok: channel://

Note that because Validate_US is a beta release (at the time of this writing), you need to pass the -f option to the install command in order to force installation.

Using Validate_US

The Validate_US package is extremely easy to use; simply instantiate the Validate_US() class and call the appropriate validation method. In total there are seven methods, four of which are relevant to this discussion:

phoneNumber() : Validates a phone number, returning TRUE on success, and FALSE otherwise. It accepts phone numbers in a variety of formats, including xxx xxx-xxxx , (xxx) xxx-xxxx , and similar combinations without dashes, parentheses, or spaces. For example, (614)999-9999 , 6149999999 , and (614)9999999 are all valid, whereas (6149999999 , 614-999-9999 , and 614999 are not.

postalCode() : Validates a ZIP code, returning TRUE on success, and FALSE otherwise. It accepts ZIP codes in a variety of formats, including xxxxx , xxxxxxxxx , xxxxx-xxxx , and similar combinations without the dash. For example, 43210 and 43210-0362 are both valid, whereas 4321 and 4321009999 are not.

region() : Validates a state abbreviation, returning TRUE on success, and FALSE otherwise. It accepts two-letter state abbreviations as supported by the U.S. Postal Service ( ncsc/lookups/usps_abbreviations.html ). For example, OH , CA , and NY are all valid, whereas CC , DUI , and BASF are not.

ssn() : Validates an SSN by not only checking the SSN syntax but also reviewing validation information made available via the Social Security Administration Web site ( ), returning TRUE on success, and FALSE otherwise. It accepts SSNs in a variety of formats, including xxx-xx-xxxx , xxx xx xxx , xxx/xx/xxxx , xxxtxxtxxxx ( t = tab), xxxnxxnxxxx ( n = newline), or any nine-digit combination thereof involving dashes, spaces, forward slashes, tabs, or newline characters. For example, 479-35-6432 and 591467543 are valid, whereas 999999999 , 777665555 , and 45678 are not.

Once you have an understanding of the method definitions, implementation is trivial. For example, suppose you want to validate a phone number. Just include the Validate_US class and call phoneNumber() like so:

include "Validate/US.php";
$validate = new Validate_US();
echo $validate->phoneNumber("614-999-9999") ? "Valid!" : "Not valid!";

Because phoneNumber() returns a Boolean, in this example the Valid! message will be returned. Contrast this with supplying 614-876530932 to phoneNumber() , which will inform the user of an invalid phone number.


Many of the functions introduced in this chapter will be among the most commonly used within your PHP applications, as they form the crux of the language’s string-manipulation capabilities.

In the next chapter, we examine another set of well-worn functions: those devoted to working with the file and operating system.   

Google+ Comments

Google+ Comments