COHERENT manpages

This page displays the COHERENT manpage for float [Data type].
List of available manpages
Index
float -- C Keyword

Data type

Floating point numbers are a subset of the real numbers.  Each has a built-
in radix  point (or ``decimal  point'') that shifts, or  ``floats'', as the
value of the  number changes.  It consists of the  following: one sign bit,
which  indicates whether  the  number is  positive or  negative; bits  that
encode the  number's exponent; and bits that  encode the number's fraction,
or the number upon which the  exponent works.  In general, the magnitude of
the number encoded depends upon the number of bits in the exponent, whereas
its precision depends upon the number of bits in the fraction.

The ranges of values that can be held by a COHERENT float are set in header
file float.h.

The exponent often uses a bias. This is a value that is subtracted from the
exponent to yield the power of two by which the fraction will be increased.

Floating point  numbers come in two levels  of precision: single precision,
called   floats;  and   double   precision,  called   doubles.  With   most
microprocessors,  sizeof(float) returns  four, which  indicates that  it is
four chars (bytes) long, and sizeof(double) returns eight.

Several formats are used to  encode floats, including IEEE, DECVAX, and BCD
(binary coded decimal).

The  following   describes  DECVAX,  IEEE,   and  BCD  formats,   for  your
information.

DECVAX Format

The 32 bits in a float  consist of one sign bit, an eight-bit exponent, and
a 24-bit  fraction, as follows.   Note that in this  diagram, `s' indicates
``sign'', `e' indicates ``exponent'', and `f` indicates ``fraction'':



     =============
     | seee eeee |Byte 4
     |===========|
     | efff ffff |Byte 3
     |===========|
     | ffff ffff |Byte 2
     |===========|
     | ffff ffff |Byte 1
     =============

The exponent has a bias of 129.

If the  sign bit is  set to one,  the number is  negative; if it  is set to
zero, then  the number is positive.   If the number is  all zeroes, then it
equals  zero;  an  exponent  and  fraction  of zero  plus  a  sign  of  one
(``negative zero'')  is by  definition not a  number.  All other  forms are
numeric values.

The most  significant bit in the  fraction is always set to  one and is not
stored.  It is usually called the ``hidden bit''.

The format for  doubles simply adds another 32 fraction  bits to the end of
the float representation, as follows:



     =============
     | seee eeee |Byte 8
     |===========|
     | efff ffff |Byte 7
     |===========|
     | ffff ffff |Byte 6
     |===========|
     | ffff ffff |Byte 5
     |===========|
     | ffff ffff |Byte 4
     |===========|
     | ffff ffff |Byte 3
     |===========|
     | ffff ffff |Byte 2
     |===========|
     | ffff ffff |Byte 1
     =============

IEEE Format

The IEEE  encoding of  a float is  the same as  that in the  DECVAX format.
Note, however, that the exponent has a bias of 127, rather than 129.

Unlike the  DECVAX format,  IEEE format  assigns special values  to several
floating point  numbers.  Note  that in  the following description,  a tiny
exponent is one that is all  zeroes, and a huge exponent is one that is all
ones:

-> A tiny exponent  with a fraction of zero equals  zero, regardless of the
   setting of the sign bit.

-> A huge  exponent with a fraction of zero  equals infinity, regardless of
   the setting of the sign bit.

-> A  tiny exponent  with a  fraction greater than  zero is  a denormalized
   number, i.e., a number that is less than the least normalized number.

-> A huge exponent with a fraction greater than zero is, by definition, not
   a number.  These values can be used to handle special conditions.

An  IEEE double,  unlike DECVAX  format, increases  the number  of exponent
bits.   It  consists of  a  sign  bit, an  11-bit  exponent,  and a  53-bit
fraction, as follows:



    =============
    | seee eeee |   Byte 8
    |===========|
    | eeee ffff |   Byte 7
    |===========|
    | ffff ffff |   Byte 6
    |===========|
    | ffff ffff |   Byte 5
    |===========|
    | ffff ffff |   Byte 4
    |===========|
    | ffff ffff |   Byte 3
    |===========|
    | ffff ffff |   Byte 2
    |===========|
    | ffff ffff |   Byte 1
    =============

The exponent  has a bias of  1,023.  The rules of encoding  are the same as
for floats.

BCD Format

The BCD  format (``binary coded decimal'',  also called ``packed decimal'')
is used to eliminate rounding errors  that alter the worth of an account by
a fraction of  a cent.  It consists of a  sign, an exponent, and a chain of
four-bit numbers, each of which is  defined to hold the values zero through
nine.

A  BCD float  has a  sign  bit, seven  bits of  exponent, and  six four-bit
digits.  In the following diagrams, `d' indicates ``digit'':



    =============
    | seee eeee |   Byte 4
    |===========|
    | dddd dddd |   Byte 3
    |===========|
    | dddd dddd |   Byte 2
    |===========|
    | dddd dddd |   Byte 1
    =============

A BCD double  has a sign bit, 11 bits  of exponent, and 13 four-bit digits,
as follows:



    =============
    | seee eeee |   Byte 8
    |===========|
    | eeee dddd |   Byte 7
    |===========|
    | dddd dddd |   Byte 6
    |===========|
    | dddd dddd |   Byte 5
    |===========|
    | dddd dddd |   Byte 4
    |===========|
    | dddd dddd |   Byte 3
    |===========|
    | dddd dddd |   Byte 2
    |===========|
    | dddd dddd |   Byte 1
    =============

Passing the hexadecimal numbers A through F in a digit yields unpredictable
results.

The following rules apply when handling BCD numbers:

-> A tiny exponent with a fraction of zero equals zero.

-> A tiny  exponent with  a fraction  of non-zero indicates  a denormalized
   number.

-> A huge exponent with a fraction of zero indicates infinity.

-> A huge  exponent with a  fraction of non-zero  is, by definition,  not a
   number; these non-numbers are used to indicate errors.

COHERENT Floating Point

COHERENT  286 uses  DECVAX floating-point format.   COHERENT 386  uses IEEE
floating-point  format.   Please note  that  this does  not  mean that  the
COHERENT-386  floating-point software fully  implements the  IEEE standard;
for example, it does not support denormals.

To  allow you  to convert  binary  data from  one floating-point  format to
another,  COHERENT comes  with four  functions with  which you  can convert
DECVAX-format floating-point numbers  to IEEE format, and vice versa.  They
are as follows:

decvax_d()
        Convert an IEEE double to DECVAX format.

decvax_f()
        Convert an IEEE float to DECVAX format.

ieee_d()
        Convert a DECVAX double to IEEE format.

ieee_f()
        Convert a DECVAX float to IEEE format.

For details, see their respective entries in the Lexicon.

See Also

C keywords,
data formats,
decvax_d,
decvax_f,
double,
ecvt(),
em87,
fcvt(),
float,
float.h,
gcvt(),
ieee_d,
ieee_f
The Art of Computer Programming, vol. 2, page 180ff
ANSI Standard, §6.1.2.5

Notes

The COHERENT-386  preprocessor implicitly defines the  macro _IEEE, whereas
the COHERENT-286  preprocessor implicitly defines the  macro _DECVAX. These
can  be used  to  conditionally include  code  that applies  to a  specific
edition  of  COHERENT.  If  you  were writing  code  that intensively  used
floating-point numbers and you want to compile the code under both editions
of COHERENT, you can write code of the form:

    #ifdef _DECVAX
        ...
    #elif _IEEE
        ...
    #endif

The  C preprocessor  under each  edition of COHERENT  will ensure  that the
correct code is included for compilation.