Numerical values that aren’t integers are stored as floatingpoint numbers. Internally, floatingpoint numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floatingpoint numbers work, I’ll talk about them as decimal values. Numerical values that aren’t integers are stored as numbers. Internally, floatingpoint numbers have three parts: a sign (positive or negative), a mantissa (which is a value greater than or equal to 1 and less than 2 that has a fixed number of digits), and an exponent. Inside your computer, of course, both the mantissa and the exponent are binary values, but for the purposes of explaining how floatingpoint numbers work, I’ll talk about them as decimal values.The value of a floatingpoint number is the signed value of the mantissa, multiplied by 10 to the power of the exponent, as shown in Table 27. Table 27. FloatingPoint Number Value Sign(+/) Mantissa Exponent Value  1.2345 3 –1.2345x103 (which is –1234.5)
You can write a floating point literal in three basic forms:
All three examples correspond to the same value, 110.0. Note that spaces aren’t allowed within floatingpoint literals, so you must not write 1.1 E2, for example. The latter would be interpreted by the compiler as two separate things: the floatingpoint literal 1.1 and the name E2. NOTE A floatingpoint literal must contain a decimal point, or an exponent, or both. If you write a numeric literal with neither, then you have an integer. FloatingPoint Data TypesThere are three floatingpoint data types that you can use, as described in Table 28.
The term “precision” here refers to the number of significant digits in the mantissa. The data types are in order of increasing precision, withfloatproviding the lowest number of digits in the mantissa andlong doublethe highest. Note that the precision only determines the number of digits in the mantissa. The range of numbers that can be represented by a particular type is determined by the range of possible exponents. The precision and range of values aren’t prescribed by the ANSI standard for C++, so what you get with each of these types depends on your compiler. This will usually make the best of the floatingpoint hardware facilities provided by your computer. Generally, typelong doublewill provide a precision that’s greater than or equal to that of typedouble, which in turn will provide a precision that is greater than or equal to that of typefloat. Typically, you’ll find that typefloatwill provide 7 digits precision, typedoublewill provide 15 digits precision, and typelong doublewill provide 19 digits precision, althoughdoubleandlong doubleturn out to be the same with some compilers. As well as increased precision, you’ll usually get an increased range of values with typesdoubleandlong double. Typical ranges of values that you can represent with the floatingpoint types on an Intel processor are shown in Table 29.
The numbers of decimal digits of precision in Table 29 are approximate. Zero can be represented exactly for each of these types, but values between zero and the lower limit in the positive or negative range can’t be represented, so these lower limits for the ranges are the smallest nonzero values that you can have. Simple floatingpoint literals with just a decimal point are of typedouble, so let’s look at how to define variables of that type first. You can specify a floatingpoint variable using the keyworddouble, as in this statement: double inches_to_mm = 25.4; This declares the variableinches_to_mmto be of typedoubleand initializes it with the value 25.4. You can also useconstwhen declaring floatingpoint variables, and this is a case in which you could sensibly do so. If you want to fix the value of the variable, the declaration statement might be const double inches_to_mm = 25.4; // A constant conversion factor If you don’t need the precision and range of values that variables of typedoubleprovide you can opt to use the keywordfloatto declare your floatingpoint variable, for example: float pi = 3.14159f; This statement defines a variablepiwith the initial value 3.14159. Thefat the end of the literal specifies it to be afloattype. Without thef, the literal would have been of typedouble, which wouldn’t cause a problem in this case, although you may get a warning message from your compiler. You can also use an uppercase letter F to indicate that a floatingpoint literal is of typefloat. To specify a literal of typelong double, you append an upper or lowercase L to the number. You could therefore declare and initialize a variable of this type with the statement long double root2 = 1.4142135623730950488L; // Square root of 2 FloatingPoint OperationsThe modulus operator,%, can’t be used with floatingpoint operands, but all the other binary arithmetic operators that you have seen,+,,*, and/, can be. You can also apply the prefix and postfix increment and decrement operators,++and, to a floatingpoint variable with the same effect as for an integer—the variable will be incremented or decremented by 1. As with integer operands, the result of division by zero is undefined so far as the standard is concerned, but specific C++ implementations generally have their own way of dealing with this, so consult your product documentation. With most computers today, the hardware floatingpoint operations are implemented according to the IEEE 754 standard (also known as IEC 559). Although IEEE 754 isn’t required by the C++ standard, it does provide for identification of some aspects of floatingpoint operations on machines on which IEEE 754 applies. The floatingpoint standard defines special values having a binary mantissa of all zeros and an exponent of all ones to represent+infinityorinfinity, depending on the sign. When you divide a positive nonzero value by zero, the result will be+infinity, and dividing a negative value by zero will result ininfinity. Another special floatingpoint value defined by IEEE 754 is called Not a Number, usually abbreviated toNaN. This is used to represent a result that isn't mathematically defined, such as arises when you divide zero by zero or you divide infinity by infinity. Any subsequent operation in which either or both operands are a value ofNaNresults inNaN. Once an operation in your program results in a value of±infinity, this will pollute all subsequent operations in which it participates. Combining a normal value with±infinityresults in±infinity. Dividing±infinityby±infinityor multiplying±infinityby zero results inNaN. Table 210 summarizes all these possibilities.
Using floatingpoint variables is really quite straightforward, but there’s no substitute for experience, so let’s try an example.
blog comments powered by Disqus 






