Home arrow Practices arrow Page 12 - Basic Data Types and Calculations

Working with Characters - Practices

This article looks at some of the basic data types that are built into C++. If you're learning how to use C++, you will want to keep reading, since you'll be using these data types in all of your programs. It is taken from chapter two of the book Beginning ANSI C++: The Complete Language, by Ivor Horton (Apress, 2004; ISBN: 1590592271).

TABLE OF CONTENTS:
  1. Basic Data Types and Calculations
  2. Performing Simple Calculations
  3. Try It Out: Integer Arithmetic in Action
  4. Try It Out: Fixing the Appearance of the Output
  5. Try It Out: Using Integer Variables
  6. The Assignment Operator
  7. Incrementing and Decrementing Integers
  8. Numerical Functions for Integers
  9. Floating-Point Operations
  10. Try It Out: Floating-Point Arithmetic
  11. Try It Out: Yet More Output Manipulators
  12. Working with Characters
  13. Functional Notation for Initial Values
  14. Exercises
By: Apress Publishing
Rating: starstarstarstarstar / 14
September 08, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Variables of type char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.

char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.Variables of type char are primarily used to store a code for a single character and occupy 1 byte in memory. The C++ standard doesn’t specify the character encoding to be used for representing the basic character set, so this is determined by a particular compiler. It’s typically, but not exclusively, ASCII.

You declare variables of typecharin the same way as variables of the other types that you’ve seen, for example:

char letter; char yes, no;

The first statement declares a single variable of typecharwith the nameletter. The second variable declares two variables of typecharhaving the namesyesandno. Each of these variables can store the code for a single character. Because you haven’t provided initial values for these variables, they’ll contain junk values.

Character Literals

When you declare a variable of typechar, you can initialize it with a character literal. You write a character literal as the character that you require between single quotes. For example,'z','3', and'?'are all character literals.

Some characters are problematical to enter as literals. Obviously, a single quote presents a bit of a difficulty because it’s a delimiter for a character literal. In fact, it isn’t legal in C++ to put either a single quote or a backslash character between single quotes. Control characters such as newline and tab are also a problem because they result in an effect when you press the key for the appropriate character rather than entering the character as data. You can specify all of these problem characters by using escape sequences that begin with a backslash, as shown in Table 2-13.

To specify a character literal corresponding to any of these characters, you just type in the corresponding escape sequence between single quotes. For instance, new-line is'n'and backslash is''.

There are also escape sequences that you can use to specify a character by its code expressed as either an octal or a hexadecimal value. The escape sequence for an octal character code is one to three octal digits preceded by a backslash. The escape sequence for a hexadecimal character code is one or more hexadecimal digits preceded byx. You write both forms between single quotes when you want to define a character literal. For example, the letter'A'could be written as hexadecimal'x41'or octal'81'in US-ASCII code. Obviously, you could write codes that won’t fit within a single byte, in which case the result is implementation defined.

If you write a character literal with more than one character between the single quotes and the characters don’t represent an escape sequence—'abc'is an example— then the literal is described as a multicharacter literal and will be of typeint. The numerical value of such a literal is implementation defined but will usually be the result of placing the 1-byte codes for the characters in successive bytes of theintvalue. If you specify a multicharacter literal with more than four characters, this will usually result in an error message from the compiler.

You now know enough about character literals to initialize your variables of typecharproperly.

Initializing char Variables

You can define and initialize a variable of typecharwith the statement

char letter = 'A'; // Stores a single letter 'A'

82 This statement defines the variable with the nameletterto be of typecharwith an initial value'A'. If your compiler represents characters using US-ASCII codes, this will have the decimal value 65.

You can declare and initialize multiple variables in a single statement:

char yes = 'y', no = 'n', tab = 't';

Because you can treat variables of typecharas integers, you could equally well declare and initialize the variableletterwith this statement:

char letter = 65; // Stores the ASCII code for 'A'

Remember that typecharmay be signed or unsigned by default, depending on the compiler, so this will affect what numerical values can be accommodated. Ifcharis unsigned, values can be from 0 to 255. If it’s signed, values can be from –128 to +127. Of course, the range of bit patterns that can be stored is the same in both cases. They’re just interpreted differently.

Of course, you can use the variableletteras an operand in integer operations, so you can write

letter += 2;

This will result in the value stored inletterbeing incremented to 67, which is'C'in US-ASCII. You can find all the US-ASCII codes in Appendix A of this book.

CAUTION Although I’ve assumed US-ASCII coding in the examples, as I noted earlier although this is usually the case this doesn’t have to be so. On older main frame computers, for instance, characters may be represented using Extended Binary Coded Decimal Interchange Code (EBCDIC), in which the codes for some characters are different from US-ASCII.

You can explicitly declare a variable as typesigned charorunsigned char, which will affect the range of integers that can be represented. For example, you can declare a variable as follows:

unsigned char ch = 0U;

In this case, the numerical values can range from 0 to 255.

When you read from a stream into a variable of typechar, the first nonwhitespace character will be stored. This means that you can’t read whitespace characters in this way—they’re simply ignored. Further, you can’t read a numerical value into a variable of typechar—if you try, you’ll find that the character code for the first digit will be stored. When you output a variable of typecharto the screen, it will be as a character, not a numerical value. You can see this demonstrated in the next example.

Try It Out: Handling Character Values

This example reads a character from the keyboard, outputs the character and its numerical code, increments the value of the character, and outputs the result as a character and as an integer:

// Program 2.9 – Handling character values #includeusing std::cin; using std::cout; using std::endl;

int main() { char ch = 0; int ch_value = 0;

// Read a character from the keyboard cout << "Enter a character: "; cin >> ch; ch_value = ch; // Get integer value of character

cout << endl << ch << " is " << ch_value;

ch_value = ++ch; // Increment ch and store as integer

cout << endl << ch << " is " << ch_value << endl;

return 0; }

Typical output from this example is as follows:

Enter a character: w

w is 119 x is 120

After prompting for input, the program reads a character from the keyboard with the statement

cin >> ch;

Only nonwhitespace characters are accepted, so you can press Enter or enter spaces and tabs and they’ll all be ignored.

Stream output will always output the variablechas a character. To get the numerical code, you need a way to convert it to an integer type. The next statement does this:

ch_value = ch; // Get integer value of character

The compiler will arrange to convert the value stored inchfrom typecharto typeintso that it can be stored in the variablech_value. You’ll see more about automatic conversions in the next chapter, when I discuss expressions involving values of different types.

Now you can output the character as well as its integer code with the following statement:

cout << endl << ch << " is " << ch_value;

The next statement demonstrates that you can operate with variables of typecharas integers:

ch_value = ++ch; // Increment ch and store as integer

This statement increments the contents ofchand stores the result in the variablech_value, so you have both the next character and its numerical representation. This is output to the display with exactly the same statement as was used previously. Although you just incrementedchhere, variables of typecharcan be used with all of the arithmetic operators, just like any of the integer types.

Working with Extended Character Sets

Single-byte character codes such as ASCII or EBCDIC are generally adequate for national language character sets that use Latin characters. There are also 8-bit character encodings that will accommodate other languages such as Greek or Russian. However, if you want to work with these and Latin characters simultaneously, or if you want to handle character sets for Asian languages that require much larger numbers of character codes than the ASCII set, 256 character codes doesn’t go far enough.

The typewchar_tis a character type that can store all members of the largest extended character set that’s support by an implementation. The type name derives from wide characters, because the character is “wider” than the usual single-byte character. By contrast, typecharis referred to as “narrow” because of the limited range of character codes that are available. The size of variables of typewchar_tisn’t stipulated by the C++ standard, except that it will have the same characteristics as one of the other integer types. It is often 2 bytes on PCs, and typically the underlying type isunsigned short, but it can also be 4 bytes with some compilers, especially those implemented on Unix workstations.

Wide-Character Literals

You define wide-character literals in the same way as narrow character literals that you use with typechar, but you prefix them with the letter L. For example,

wchar_t wide_letter = L'Z';

defines the variablewide_letterto be of typewchar_tand initializes it to the wide-char-acter representation for Z.

Your keyboard may not have keys for representing other national language characters, but you can still create them using hexadecimal notation, for example:

wchar_t wide_letter = L'x0438'; // Cyrillic

The value between the single quotes is an escape sequence that allows you to specify a character by a hexadecimal representation of the character code. The backslash indicates the start of the escape sequence, and thexafter the backslash signifies that the code is hexadecimal. The absence ofxorXwould indicate that the characters that follow are to be interpreted as octal digits.

Of course, you could also use the notation for UCS character literals:

wchar_t wide_letter = L'u0438'; // Cyrillic

If your compiler supports 4-byte UCS characters, you could also initialize a variable of typewchar_twith a UCS character specified asUdddddddd, wheredis a hexadecimal digit.

Wide-Character Streams

The streamscinandcoutthat you’ve been using are narrow-character streams. They only handle characters that consist of a single byte, so you can’t extract fromcininto a variable of typewchar_t. Theheader defines special wide-character streams,wcinandwcoutfor input and output of wide characters. You use the wide streams in the same way as the narrow streams. For instance, you can read a wide character fromwcinlike this:

wchat_t wide_letter = 0; std::wcin >> wide_letter; // Read a wide character

Although you’ll always be able to write wide characters towcout, this doesn’t mean that such characters will display correctly or at all. It depends on if your operating system recognizes the character codes.



 
 
>>> More Practices Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PRACTICES ARTICLES

- Calculating Development Project Costs
- More Techniques for Finding Things
- Finding Things
- Finishing the System`s Outlines
- The System in So Many Words
- Basic Data Types and Calculations
- What`s the Address? Pointers
- Design with ArgoUML
- Pragmatic Guidelines: Diagrams That Work
- Five-Step UML: OOAD for Short Attention Span...
- Five-Step UML: OOAD for Short Attention Span...
- Introducing UML: Object-Oriented Analysis an...
- Class and Object Diagrams
- Class Relationships
- Classes

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: