Basic Data Types and Calculations - Working with Characters (
Page 12 of 14 )
Variables of type char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.
char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.
Variables of type char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.
You declare variables of type
char
in the same way as variables of the other types that you’ve seen, for example:
char letter; char yes, no;
The first statement declares a single variable of type
char
with the name
letter
. The second variable declares two variables of type
char
having the names
yes
and
no
. Each of these variables can store the code for a single character. Because you haven’t pro
vided initial values for these variables, they’ll contain junk values.
Character Literals
When you declare a variable of type
char
, you can initialize it with a character literal. You write a character literal as the character that you require between single quotes. For example,
'z'
,
'3'
, and
'?'
are all character literals.
Some characters are problematical to enter as literals. Obviously, a single quote presents a bit of a difficulty because it’s a delimiter for a character literal. In fact, it isn’t legal in C++ to put either a single quote or a backslash character between single quotes. Control characters such as newline and tab are also a problem because they result in an effect when you press the key for the appropriate character rather than entering the character as data. You can specify all of these problem characters by using escape sequences that begin with a backslash, as shown in Table 2-13.
To specify a character literal corresponding to any of these characters, you just type in the corresponding escape sequence between single quotes. For instance, new-line is
'n'
and backslash is
''
.
There are also escape sequences that you can use to specify a character by its code expressed as either an octal or a hexadecimal value. The escape sequence for an octal character code is one to three octal digits preceded by a backslash. The escape sequence for a hexadecimal character code is one or more hexadecimal digits preceded by
x
. You write both forms between single quotes when you want to define a character literal. For example, the letter
'A'
could be written as hexadecimal
'x41'
or octal
'81'
in US-ASCII code. Obviously, you could write codes that won’t fit within a single byte, in which case the result is implementation defined.
If you write a character literal with more than one character between the single quotes and the characters don’t represent an escape sequence—
'abc'
is an example— then the literal is described as a multicharacter literal and will be of type
int
. The numerical value of such a literal is implementation defined but will usually be the result of placing the 1-byte codes for the characters in successive bytes of the
int
value. If you specify a multicharacter literal with more than four characters, this will usually result in an error message from the compiler.
You now know enough about character literals to initialize your variables of type
char
properly.
Initializing char Variables
You can define and initialize a variable of type
char
with the statement
char letter = 'A'; // Stores a single letter 'A'
82 This statement defines the variable with the name
letter
to be of type
char
with an initial value
'A'
. If your compiler represents characters using US-ASCII codes, this will have the decimal value 65.
You can declare and initialize multiple variables in a single statement:
char yes = 'y', no = 'n', tab = 't';
Because you can treat variables of type
char
as integers, you could equally well declare and initialize the variable
letter
with this statement:
char letter = 65; // Stores the ASCII code for 'A'
Remember that type
char
may be signed or unsigned by default, depending on the compiler, so this will affect what numerical values can be accommodated. If
char
is unsigned, values can be from 0 to 255. If it’s signed, values can be from –128 to +127. Of course, the range of bit patterns that can be stored is the same in both cases. They’re just interpreted differently.
Of course, you can use the variable
letter
as an operand in integer operations, so you can write
letter += 2;
This will result in the value stored in
letter
being incremented to 67, which is
'C'
in US-ASCII. You can find all the US-ASCII codes in Appendix A of this book.
CAUTION Although I’ve assumed US-ASCII coding in the examples, as I noted earlier although this is usually the case this doesn’t have to be so. On older main frame computers, for instance, characters may be represented using Extended Binary Coded Decimal Interchange Code (EBCDIC), in which the codes for some characters are different from US-ASCII.
You can explicitly declare a variable as type
signed char
or
unsigned char
, which will affect the range of integers that can be represented. For example, you can declare a variable as follows:
unsigned char ch = 0U;
In this case, the numerical values can range from 0 to 255.
When you read from a stream into a variable of type
char
, the first nonwhitespace character will be stored. This means that you can’t read whitespace characters in this way—they’re simply ignored. Further, you can’t read a numerical value into a variable of type
char
—if you try, you’ll find that the character code for the first digit will be stored. When you output a variable of type
char
to the screen, it will be as a character, not a numerical value. You can see this demonstrated in the next example.
Try It Out: Handling Character Values
This example reads a character from the keyboard, outputs the character and its numerical code, increments the value of the character, and outputs the result as a character and as an integer:
// Program 2.9 – Handling character value
s #include
using std::cin; using std::cout; using std::endl;
int main() { char ch = 0; int ch_value = 0;
// Read a character from the keyboard cout << "Enter a character: "; cin >> ch; ch_value = ch; // Get integer value of character
cout << endl << ch << " is " << ch_value;
ch_value = ++ch; // Increment ch and store as integer
cout << endl << ch << " is " << ch_value << endl;
return 0; }
Typical output from this example is as follows:
Enter a character: w
w is 119 x is 120
After prompting for input, the program reads a character from the keyboard with the statement
cin >> ch;
Only nonwhitespace characters are accepted, so you can press Enter or enter spaces and tabs and they’ll all be ignored.
Stream output will always output the variable
ch
as a character. To get the numer
ical code, you need a way to convert it to an integer type. The next statement does this:
ch_value = ch; // Get integer value of character
The compiler will arrange to convert the value stored in
ch
from type
char
to type
int
so that it can be stored in the variable
ch_value
. You’ll see more about automatic conversions in the next chapter, when I discuss expressions involving values of different types.
Now you can output the character as well as its integer code with the following statement:
cout << end
l << ch << " is " << ch_value;
The next statement demonstrates that you can operate with variables of type
char
as integers:
ch_value = ++ch; // Increment ch and store as integer
This statement increments the contents of
ch
and stores the result in the variable
ch_value
, so you have both the next character and its numerical representation. This is output to the display with exactly the same statement as was used previ
ously. Although you just incremented
ch
here, variables of type
char
can be used with all of the arithmetic operators, just like any of the integer types.
Working with Extended Character Sets
Single-byte character codes such as ASCII or EBCDIC are generally adequate for national language character sets that use Latin characters. There are also 8-bit character encodings that will accommodate other languages such as Greek or Russian. However, if you want to work with these and Latin characters simultaneously, or if you want to handle character sets for Asian languages that require much larger numbers of character codes than the ASCII set, 256 character codes doesn’t go far enough.
The type
wchar_t
is a character type that can store all members of the largest extended character set that’s support by an implementation. The type name derives from wide characters, because the character is “wider” than the usual single-byte character. By contrast, type
char
is referred to as “narrow” because of the limited range of character codes that are available. The size of variables of type
wchar_t
isn’t stipulated by the C++ standard, except that it will have the same characteristics as one of the other integer types. It is often 2 bytes on PCs, and typically the underlying type is
unsigned short
, but it can also be 4 bytes with some compilers, especially those implemented on Unix workstations.
Wide-Character Literals
You define wide-character literals in the same way as narrow character literals that you use with type
char
, but you prefix them with the letter L. For example,
wchar_t wide_letter = L'Z';
defines the variable
wide_letter
to be of type
wchar_t
and initializes it to the wide-char-acter representation for Z.
Your keyboard may not have keys for representing other national language characters, but you can still create them using hexadecimal notation, for example:
wchar_t wide_letter = L'x0438'; // Cyrillic
The value between the single quotes is an escape sequence that allows you to specify a character by a hexadecimal representation of the character code. The backslash indicates the start of the escape sequence, and the
x
after the backslash signifies that the code is hexadecimal. The absence of
x
or
X
would indicate that the characters that follow are to be interpreted as octal digits.
Of course, you could also use the notation for UCS character literals:
wchar_t wide_letter = L'u0438'; // Cyrillic
If your compiler supports 4-byte UCS characters, you could also initialize a variable of type
wchar_t
with a UCS character specified as
Udddddddd
, where
d
is a hexadecimal digit.
Wide-Character Streams
The streams
cin
and
cout
that you’ve been using are narrow-character streams. They only handle characters that consist of a single byte, so you can’t extract from
cin
into a variable of type
wchar_t
. The
header defines special wide-character streams,
wcin
and
wcout
for input and output of wide characters. You use the wide streams in the same way as the narrow streams. For instance, you can read a wide character from
wcin
like this:
wchat_t wide_letter = 0
; std::wcin >> wide_letter; // Read a wide character
Although you’ll always be able to write wide characters to
wcout
, this doesn’t mean that such characters will display correctly or at all. It depends on if your operating sys
tem recognizes the character codes.