Basic Data Types and Calculations - Floating-Point Operations (
Page 9 of 14 )
Numerical values that aren’t integers are stored as floating-point numbers. Internally, floating-point numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floating-point numbers work, I’ll talk about them as decimal values.
Numerical values that aren’t integers are stored as numbers. Internally, floating-point numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floating-point numbers work, I’ll talk about them as decimal values.
The value of a floating-point number is the signed value of the mantissa, multiplied by 10 to the power of the exponent, as shown in Table 2-7.
Table 2-7. Floating-Point Number Value
Sign(+/-) Mantissa Exponent Value
-
1.2345 3 –1.2345x103 (which is –1234.5)
You can write a floating point literal in three basic forms:
- As a decimal value including a decimal point (for example, 110.0).
- With an exponent (for example, 11E1) in which the decimal part is multiplied by the power of 10 specified after the E (for exponent). You have the option of using either an upper- or a lowercase letter E to precede the exponent.
- Using both a decimal point and an exponent (for example, 1.1E2).
All three examples correspond to the same value, 110.0. Note that spaces aren’t allowed within floating-point literals, so you must not write 1.1 E2, for example. The latter would be interpreted by the compiler as two separate things: the floating-point literal 1.1 and the name E2.
NOTE A floating-point literal must contain a decimal point, or an exponent, or both. If you write a numeric literal with neither, then you have an integer.
Floating-Point Data Types
There are three floating-point data types that you can use, as described in Table 2-8.
Table 2-8. Floating-Point Data Types
| Data Type |
Description |
|
float |
Single precision floating-point values |
|
double |
Double precision floating-point values |
|
long double |
Double-extended precision floating-point values |
The term “precision” here refers to the number of significant digits in the mantissa. The data types are in order of increasing precision, with
float
providing the lowest number of digits in the mantissa and
long double
the highest. Note that the precision only determines the number of digits in the mantissa. The range of numbers that can be represented by a particular type is determined by the range of possible exponents.
The precision and range of values aren’t prescribed by the ANSI standard for C++, so what you get with each of these types depends on your compiler. This will usually make the best of the floating-point hardware facilities provided by your computer. Generally, type
long double
will provide a precision that’s greater than or equal to that of type
double
, which in turn will provide a precision that is greater than or equal to that of type
float
.
Typically, you’ll find that type
float
will provide 7 digits precision, type
double
will provide 15 digits precision, and type
long double
will provide 19 digits precision, although
double
and
long double
turn out to be the same with some compilers. As well as increased precision, you’ll usually get an increased range of values with types
double
and
long double
.
Typical ranges of values that you can represent with the floating-point types on an Intel processor are shown in Table 2-9.
Table 2-9. Floating-Point Type Ranges
| Type |
Precision (Decimal Digits) |
Range (+ or –) |
|
float |
7 |
1.2x10-38 to 3.4x1038 |
|
double |
15 |
2.2x10-308 to 1.8x10308 |
|
long double |
19 |
3.3x10-4932 to 1.2x104932 |
The numbers of decimal digits of precision in Table 2-9 are approximate. Zero can be represented exactly for each of these types, but values between zero and the lower limit in the positive or negative range can’t be represented, so these lower limits for the ranges are the smallest nonzero values that you can have.
Simple floating-point literals with just a decimal point are of type
double
, so let’s look at how to define variables of that type first. You can specify a floating-point variable using the keyword
double
, as in this statement:
double inches_to_mm = 25.4;
This declares the variable
inches_to_mm
to be of type
double
and initializes it with the value 25.4. You can also use
const
when declaring floating-point variables, and this is a case in which you could sensibly do so. If you want to fix the value of the variable, the declaration statement might be
const double inches_to_mm = 25.4; // A constant conversion factor
If you don’t need the precision and range of values that variables of type
double
provide you can opt to use the keyword
float
to declare your floating-point variable, for example:
float pi = 3.14159f;
This statement defines a variable
pi
with the initial value 3.14159. The
f
at the end of the literal specifies it to be a
float
type. Without the
f
, the literal would have been of type
double
, which wouldn’t cause a problem in this case, although you may get a warning message from your compiler. You can also use an uppercase letter F to indicate that a floating-point literal is of type
float
.
To specify a literal of type
long double
, you append an upper- or lowercase L to the number. You could therefore declare and initialize a variable of this type with the statement
long double root2 = 1.4142135623730950488L; // Square root of 2
Floating-Point Operations
The modulus operator,
%
, can’t be used with floating-point operands, but all the other binary arithmetic operators that you have seen,
+
,
-
,
*
, and
/
, can be. You can also apply the prefix and postfix increment and decrement operators,
++
and
--
, to a floating-point variable with the same effect as for an integer—the variable will be incremented or decremented by 1.
As with integer operands, the result of division by zero is undefined so far as the standard is concerned, but specific C++ implementations generally have their own way of dealing with this, so consult your product documentation.
With most computers today, the hardware floating-point operations are implemented according to the IEEE 754 standard (also known as IEC 559). Although IEEE 754 isn’t required by the C++ standard, it does provide for identification of some aspects of floating-point operations on machines on which IEEE 754 applies. The float-ing-point standard defines special values having a binary mantissa of all zeros and an exponent of all ones to represent
+infinity
or
-infinity
, depending on the sign. When you divide a positive nonzero value by zero, the result will be
+infinity
, and dividing a negative value by zero will result in
-infinity
. Another special floating-point value defined by IEEE 754 is called Not a Number, usually abbreviated to
NaN
. This is used to represent a result that isn't mathematically defined, such as arises when you divide zero by zero or you divide infinity by infinity.
Any subsequent operation in which either or both operands are a value of
NaN
results in
NaN
. Once an operation in your program results in a value of
±infinity
, this will pollute all subsequent operations in which it participates. Combining a normal value with
±infinity
results in
±infinity
. Dividing
±infinity
by
±infinity
or multiplying
±infinity
by zero results in
NaN
. Table 2-10 summarizes all these possibilities.
Table 2-10. Floating-Point Operations with NaN Operands
| Operation |
Result |
Operation |
Result |
|
±N/0 |
±Infinity |
0/0 |
NaN |
| ±Infinity±N |
±Infinity |
±Infinity/±Infinity |
NaN |
| ±Infinity*N |
±Infinity |
Infinity-Infinity |
NaN |
| ±Infinity/N |
±Infinity |
Infinity*0 |
NaN |
Using floating-point variables is really quite straightforward, but there’s no substitute for experience, so let’s try an example.