Numerical values that aren’t integers are stored as floating-point numbers. Internally, floating-point numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floating-point numbers work, I’ll talk about them as decimal values.Numerical values that aren’t integers are stored as numbers. Internally, floating-point numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floating-point numbers work, I’ll talk about them as decimal values.
The value of a floating-point number is the signed value of the mantissa, multiplied by 10 to the power of the exponent, as shown in Table 2-7.
Table 2-7. Floating-Point Number Value
Sign(+/-) Mantissa Exponent Value
- 1.2345 3 –1.2345x103 (which is –1234.5)
You can write a floating point literal in three basic forms:
All three examples correspond to the same value, 110.0. Note that spaces aren’t allowed within floating-point literals, so you must not write 1.1 E2, for example. The latter would be interpreted by the compiler as two separate things: the floating-point literal 1.1 and the name E2.
NOTE A floating-point literal must contain a decimal point, or an exponent, or both. If you write a numeric literal with neither, then you have an integer.Floating-Point Data Types
There are three floating-point data types that you can use, as described in Table 2-8.
The term “precision” here refers to the number of significant digits in the mantissa. The data types are in order of increasing precision, withfloatproviding the lowest number of digits in the mantissa andlong doublethe highest. Note that the precision only determines the number of digits in the mantissa. The range of numbers that can be represented by a particular type is determined by the range of possible exponents.
The precision and range of values aren’t prescribed by the ANSI standard for C++, so what you get with each of these types depends on your compiler. This will usually make the best of the floating-point hardware facilities provided by your computer. Generally, typelong doublewill provide a precision that’s greater than or equal to that of typedouble, which in turn will provide a precision that is greater than or equal to that of typefloat.
Typically, you’ll find that typefloatwill provide 7 digits precision, typedoublewill provide 15 digits precision, and typelong doublewill provide 19 digits precision, althoughdoubleandlong doubleturn out to be the same with some compilers. As well as increased precision, you’ll usually get an increased range of values with typesdoubleandlong double.
Typical ranges of values that you can represent with the floating-point types on an Intel processor are shown in Table 2-9.
The numbers of decimal digits of precision in Table 2-9 are approximate. Zero can be represented exactly for each of these types, but values between zero and the lower limit in the positive or negative range can’t be represented, so these lower limits for the ranges are the smallest nonzero values that you can have.
Simple floating-point literals with just a decimal point are of typedouble, so let’s look at how to define variables of that type first. You can specify a floating-point variable using the keyworddouble, as in this statement:
double inches_to_mm = 25.4;
This declares the variableinches_to_mmto be of typedoubleand initializes it with the value 25.4. You can also useconstwhen declaring floating-point variables, and this is a case in which you could sensibly do so. If you want to fix the value of the variable, the declaration statement might be
const double inches_to_mm = 25.4; // A constant conversion factor
If you don’t need the precision and range of values that variables of typedoubleprovide you can opt to use the keywordfloatto declare your floating-point variable, for example:
float pi = 3.14159f;
This statement defines a variablepiwith the initial value 3.14159. Thefat the end of the literal specifies it to be afloattype. Without thef, the literal would have been of typedouble, which wouldn’t cause a problem in this case, although you may get a warning message from your compiler. You can also use an uppercase letter F to indicate that a floating-point literal is of typefloat.
To specify a literal of typelong double, you append an upper- or lowercase L to the number. You could therefore declare and initialize a variable of this type with the statement
long double root2 = 1.4142135623730950488L; // Square root of 2Floating-Point Operations
The modulus operator,%, can’t be used with floating-point operands, but all the other binary arithmetic operators that you have seen,+,-,*, and/, can be. You can also apply the prefix and postfix increment and decrement operators,++and--, to a floating-point variable with the same effect as for an integer—the variable will be incremented or decremented by 1.
As with integer operands, the result of division by zero is undefined so far as the standard is concerned, but specific C++ implementations generally have their own way of dealing with this, so consult your product documentation.
With most computers today, the hardware floating-point operations are implemented according to the IEEE 754 standard (also known as IEC 559). Although IEEE 754 isn’t required by the C++ standard, it does provide for identification of some aspects of floating-point operations on machines on which IEEE 754 applies. The float-ing-point standard defines special values having a binary mantissa of all zeros and an exponent of all ones to represent+infinityor-infinity, depending on the sign. When you divide a positive nonzero value by zero, the result will be+infinity, and dividing a negative value by zero will result in-infinity. Another special floating-point value defined by IEEE 754 is called Not a Number, usually abbreviated toNaN. This is used to represent a result that isn't mathematically defined, such as arises when you divide zero by zero or you divide infinity by infinity.
Any subsequent operation in which either or both operands are a value ofNaNresults inNaN. Once an operation in your program results in a value of±infinity, this will pollute all subsequent operations in which it participates. Combining a normal value with±infinityresults in±infinity. Dividing±infinityby±infinityor multiplying±infinityby zero results inNaN. Table 2-10 summarizes all these possibilities.
Using floating-point variables is really quite straightforward, but there’s no substitute for experience, so let’s try an example.
blog comments powered by Disqus