Home arrow PHP arrow Page 3 - The PHP Scripting Language

Character encoding - PHP

This article describes the basics of the PHP scripting language, which is very easy to learn if you are familiar with any programming language. It is excerpted from chapter two of the book Web Database Applications with PHP and MySQL, written by Hugh E. Williams and David Lane (O'Reilly, 2004; ISBN: 0596005431).

  1. The PHP Scripting Language
  2. Creating PHP scripts
  3. Character encoding
  4. Expressions, Operators, and Variable Assignment
  5. switch Statement
  6. Changing Loop Behavior
  7. Automatic Type Conversion
  8. User-Defined Functions
  9. Static variables
  10. Managing include files
By: O'Reilly Media
Rating: starstarstarstarstar / 32
September 29, 2005

print this article



When a PHP script is executed, the PHP engine starts by reading the script from a file. A file is simply a sequence of characters than are interpreted by PHP as statements, variable identifiers, literal strings, HTML, and so on. To correctly interpret these characters, PHP needs to know the character encoding of the file. Put more simply, PHP needs to know what each 8-bit sequence that makes up a character means.

In many cases, you won’t need to worry about character encoding. By default PHP reads the characters encoded to the ISO-8859-1 standard—a standard that is equivalent to 7-bit ASCII for the first 127 characters. The ISO-8859-1 encoding standard— also known as Latin-1 encoding—uses the next 128 characters to represent characters used in Western European languages. By default PHP scripts can include ISO-8859-1 characters directly, as the following fragment demonstrates:

$gesprächsnotiz =
    "von Paulus Esterházy und Markus Hoff-Holtmannus";

Theäandácharacters in the previous example are represented by the 8-bit sequences11100100and11100001—the 228th and 225th characters from ISO-8859-1.

Sometimes, it’s not convenient to work with non-7-bit ASCII characters in an editor environment. Indeed, some programs can only handle 7-bit ASCII and ignore high-bit characters—characters with a leading “1”. You can include high-bit characters using an escape sequence to specify either a hexadecimal or octal value. Hexadecimal sequences start with \x and are followed by two digits—00 to ff—to represent 256 characters. For example, the á character can be represented in a string literal with the hexadecimal sequence \xe1 since e1 is the hexadecimal equivalent of11100100:

    "von Paulus Esterh\xe1zy und Markus Hoff-Holtmannus";

Escape sequence can only be used in string literals—PHP does not allow us to represent the variable$gesprächsnotizas$gespr\xe4chsnotiz.

Like PHP’s Zend engine, browsers need to know the character encoding of a page before the page can be correctly displayed. In this book we assume the default ISO-8859-1 character encoding, and accordingly we instruct browsers to use this encoding by including the mark-up as follows:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Other ISO-8859-x character encoding standards allow Cyrillic, Arabic, Greek, and Hebrew characters to be encoded, and a full description of these encoding standards can be found at http://en.wikipedia.org/wiki/ISO_8859.

PHP can be configured to support UTF-8; an 8-bit encoding method that can represent Unicode characters. The Unicode Standard describes a universal character encoding that defines over 49,000 characters from the world’s scripts. Unicode characters can also be encoded using UTF-16, a 16-bit encoding, however PHP does not support 16-bit characters. More information about the Unicode standard can be found at http://www.unicode.org.


Variables in PHP are identified by a dollar sign followed by the variable name. Variables don’t need to be declared before you use them; normally you just assign them a value to create them. The following code fragment shows a variable $var assigned the integer 15. Therefore,$varis defined as being of type integer.

$var = 15;

Variables in PHP are simple: when they are used, the type is implicitly defined—or redefined—and the variable implicitly declared.

Variable names are case-sensitive in PHP, so$Variable,$variable,$VAriable, and$VARIABLEare all different variables.

One of the most common sources of bugs in PHP is failing to detect that more than one variable has accidentally been created. The flexibility of PHP is a great feature but is also dangerous. We discuss in Chapter 14 how to set the error reporting of PHP so that it detects this type of error.


Data exists in different types so that appropriate operations can be performed on it. For instance, numeric values can be manipulated with arithmetic operators such as addition and subtraction; whereas strings of characters can be manipulated by operations such as converting to uppercase. In this section, we introduce the basic types; their importance will become clear as we use data in more and more complex operations.

Data exists in differentso that appropriate operations can be performed on it. For instance, numeric values can be manipulated with arithmetic operators such as addition and subtraction; whereas strings of characters can be manipulated by operations such as converting to uppercase. In this section, we introduce the basic types; their importance will become clear as we use data in more and more complex operations.

PHP has four scalar types—boolean, float, integer, and string—and two compound types, array and object. PHP also supports null—a special type that is used when a variable doesn’t have a value.

Variables of a scalar type contain a single value. Variables of a compound type—array or object—are made up of multiple scalar values or other compound values.

Arrays are discussed in detail in the next chapter, and objects are discussed in Chapter 4. Other aspects of variables—including global variables and scope—are discussed later in this chapter.

Boolean variables are as simple as they get: they can be assigned either true or false. Here are two example assignments of a Boolean variable:

$variable = false;
$test = true;

An integer is a whole number, while a float is a number that has an exponent and mantissa. The number 123.01 is a float, and so is 123.0, while the number 123 is an integer. Consider the following two examples:

// This is an integer
$var1 = 6;
// This is a float
$var2 = 6.0;

A float can also be represented using an exponential notation:

// This is a float that equals 1120
$var3 = 1.12e3;
// This is a float that equals 0.02
$var4 = 2e-2

You’ve already seen examples of strings earlier in the chapter. Here are two more example string variables:

$variable = "This is a string";
$test = 'This is also a string';

Along with the value, the type of a variable can change over the lifetime of the variable. Consider an example:

$var = 15;
$var = "Sarah the Cat";

This fragment is acceptable in PHP. The type of$varchanges from integer to string as the variable is reassigned. Letting PHP change the type of a variable as the context changes is very flexible and a little dangerous. Later in Working with Types, we show ways to avoid problems that can arise with loosely typed variables.


Constants associate a name with a scalar value. For example, the Boolean values true and false are constants associated with the values 1 and 0, respectively. It’s also common to declare constants in a script. Consider this example constant declaration:

define("PI", 3.14159);
// This outputs 3.14159
print PI;

Constants aren’t preceded by a$character. They can’t be changed once they have been defined and they can be accessed anywhere in a script (regardless of where they are declared).

Constants are useful because they allow parameters internal to the script to be grouped. When one parameter changes—for example, if you define a new maximum number of lines per web page—you can alter this constant parameter in only one place and not throughout the code.

PHP has a large number of built-in constants that a script can use. For example, the library of mathematical functions already include a definition ofM_PIto hold the constant pi:

// This outputs 3.14159265358979323846
print M_PI;

By convention, constant names use uppercase characters, and predefined constants are often named to indicate the associated library. For example the constants defined for the mathematical functions library all start withM_. We introduce predefined constants as needed throughout this book.

>>> More PHP Articles          >>> More By O'Reilly Media

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates


Dev Shed Tutorial Topics: