-----------------------------------------------------------
Integers Base 2
-----------------------------------------------------------
123 written base 2 is found by repeatedly %2 then div 2
123  %  2 =      1
123 div 2 = 61
61   %  2 =      1
61  div 2 = 30
30   %  2 =      0
30  div 2 = 15
15   %  2 =      1
15  div 2 = 7   
7    %  2 =      1   
7   div 2 = 3   
3    %  2 =      1
3   div 2 = 1   
1    %  2 =      1
1   div 2 = 0   

So 123 is 1111011 base 2

To see why this works, note that if 123 was already written
base 2, we would be manipulating the bits of 123 by pulling 
off the least significant bit (%2) and then shifting right (div 2).

111101  %  2 = 1
111101 div 2 = 11110
11110   %  2 = 0
11110  div 2 = 1111
...

-----------------------------------------------------------
Floating Point Number
-----------------------------------------------------------
A Floating Point Number is a number of the form

123.4567 which represents

1x10^2+2x10^1+3x10^0+4x10^(-1)+5x10^(-2)+6x10^(-3)+7x10^(-4)
1      2      3      4         5         6         7

We can re-write this as a Normalized Floating Point Number as

1.234567x10^(2) 

That is, as +/-d1.d2 d3 d4 ... dn x 10^(m) 
where d1 is non-zero. 

Floating point numbers on computers are
written using (base 2). 

Example:

12.25 (base 10) is the (base 2) number

1100.01

That is, 

1x2^3+1x2^2+0x2^1+0x2^0+0x2^(-1)+1x2^(-2)
1     1     0     0     0        1

Normalized, this is

1.10001 x 2^(3)

Def: When a number is written down as
+/- d1.d2 d3 ... .dn x 2^(e) we call 
                +/- the sign
    d1 d2 d3 ... dn the mantissa
                  e the exponent

-----------------------------------------------------------
Adding/Subtracting Floating Point Numbers
-----------------------------------------------------------
Note: We will consider a normalized (leading number is non-zero) 
system with 6 digits 

By Example, base 10.

    1.23450x10^0   = 1.23450 = 1.23450x10^0
   +4.50000x10^-2  = 0.04500 = 0.04500x10^0
  --------------------------
                     1.27950 = 1.27950x10^0

  So convert to the same exponent and then add/subtract.

    1.23450x10^12  = 1.23450x10^12
   +4.50000x10^10  = 0.04500x10^12
  --------------------------
                   = 1.27950x10^12

    1.23450x10^12  = 1.23450x10^12
   -1.23400x10^12  =-1.23400x10^12
  --------------------------
                   = 0.00050x10^12
                   = 5.00000x10^8  (normalize the result)

Possible problem! Order of arithmetic matters!

Exercise: Come up with an example, where
(a+b)-a == 0  with a,b!=0

a = 1.23450x10^12
b = 1.25000x10^2

Compute (a+b)

    1.23450x10^12  = 1.23450x10^12
 +  1.25000x10^2   = 0.000000000125000x10^12 
                   = 0.00000x10^12 (in our system!)
-------------------------------------
                   = 1.23450x10^12

(a+b)-a = (1.23450x10^12 + 1.25000x10^2) - 1.23450x10^12
        = (1.23450x10^12               ) - 1.23450x10^12
        = 0.00000x10^12 
        = 0.00000x10^0

Note: 
(a-a)+b = (1.23450x10^12 - 1.23450x10^12) + 1.25000x10^2
        = (0.00000x10^12                ) + 1.25000x10^2
        = (0.00000x10^0                 ) + 1.25000x10^2
        = (0.00000x10^2                 ) + 1.25000x10^2
        = 1.25000x10^2


-----------------------------------------------------------
The IEEE 754 Floating Point Single Precision Standard
-----------------------------------------------------------

The IEEE 754 Floating Point Single Precision Standard
defines how 32 bits (4 bytes) are used to represent
floating point numbers. 

There are two types of numbers represented:
Normalized and De-Normalized 

-----------------------------------------------------------
Normalized numbers
-----------------------------------------------------------
The standard states that 32 bits are to be used as follows

s[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm
1    8             23        bits

represents the number...

s 1.mmmmmmmmmmmmmmmmmmmmmmm x 2^(eeeeeeee-127)

----------------------------------
s[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm
^ sign
----------------------------------

s is the sign bit
	s is 0 if the number is positive
	s is 1 if the number is -ve

----------------------------------
s[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm
  ^^^^^^^^ exponent
----------------------------------

eeeeeeee is the 8 bits of the exponent
	written in excess 127, that is,
	realExponent = eeeeeeee-127

For Normalized numbers, 
eeeeeeee!=00000000
eeeeeeee!=11111111

Example: 
If the bits of the exponent are
00000010 (this is 2 base 10)
then realExponent = 00000010-127 = -125

The range of normalized exponents is then

00000001-127 to 111111110-127 =
        -126 to 127

----------------------------------
s[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm
  mantissa ^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------

Since we are using base 2, for normalized numbers,
we know the leading digit is 1, so we don't need to
waste a bit of data on it. That is, for normalized
numbers, the 1 is hidden so the bits

s[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm

represents

s 1.mmmmmmmmmmmmmmmmmmmmmmm x 2^(eeeeeeee-127)

Example:
12.25 (base 10)
= 1100.01
= 1.10001 x 2^(3)
= 1.10001 x 2^(130-127)
= 1.10001 x 2^(10000010-127)
= 0[10000010]10001000000000000000000

Example: .1 (=1/10) written base 2 is 

               ----
0.00011001100110011 x 2^(0)
=
1.10011001100110011 x 2^(-4) (normalized)

Get this by repeatedly x2 then look at ones digit.
Why does this work? Same intuition as in Integers Base 2
above.

Note: So some nice numbers (base 10) are not nice base 2!
That is they don't have a finite representation base 2.

.1 (base 10) is represented in IEEE 754 as

0[01111011]10011001100110011001101=+1.10011001100110011001101x2^(-4)=0.1

Note: the hidden 1, also the rounded last digit.
-----------------------------------------------------------
Rounding (to Even)
-----------------------------------------------------------
IEEE754 standard says round numbers to nearest representable
number. If there is a tie (ie midway between two representable
numbers), then round to the one with a 0 least significant digit.

Example:  

+1.10011001100110011001100x2^(-4)
                         ^ least significant bit in IEEE754 Single
.1 base 10 is            V      ----
+1.100110011001100110011001100110011x2^(-4)=0.1
                         ^ 
                         V 
+1.10011001100110011001101x2^(-4)   (this number chosen, nearest)

Example: 

+1.10011001100110011001100x2^(-4)   (this number chosen, broken tie)
                         ^ least significant bit in IEEE754 Single
                         V      
+1.100110011001100110011001x2^(-4)
                         ^ 
                         V 
+1.10011001100110011001101x2^(-4)

Example: 

+1.10011001100110011001101x2^(-4)
                         ^ least significant bit in IEEE754 Single
                         V      
+1.100110011001100110011011x2^(-4)
                         ^ 
                         V 
+1.10011001100110011001110x2^(-4)   (this number chosen, broken tie)

-----------------------------------------------------------
The distribution of floating point numbers
-----------------------------------------------------------

If we have 32 bit integers, no matter how large the numbers
are, then next one is +1 away. This is not the case for floating
point numbers. Consider

+1.00000000000000000000000 x 2^(0)
The next number is 
+1.00000000000000000000001 x 2^(0)
that is, the difference between them is
+0.00000000000000000000001 x 2^(0) = 1 x 2^(-23)

Now consider
+1.00000000000000000000000 x 2^(50)
the next number is 
+1.00000000000000000000001 x 2^(50)
that is, the difference between them is
+0.00000000000000000000001 x 2^(50) = 1 x 2^(27)
= 

That is, while the exponent is 50, consecutive representable
numbers are 1x2^(27) apart.

So the normalized floating point number line looks like 

0 ....  .  .  .  .    .    .    .    .        .        .        .        .

where each block is a collection of numbers with the same exponent.

-----------------------------------------------------------
Special Numbers
-----------------------------------------------------------

Def: Machine Epsilon (eps) is the smallest floating point 
number in the system such that 1+eps>1.

Example: From the investigation above, we see that
for IEEE754, eps = 1x 2^(-23), essentially capturing
the number of bits represented in the mantissa.

Def: Underflow (ufl) is the smallest positive representable number

Example: In the IEEE 754 Single Standard, it appears that ufl is
0[00000001]00000000000000000000000 = 1.00000000000000000000000 x 2^(-126)

BUT IEEE 754 allows Denormalized numbers (see below), 
so ufl = 

0[00000000]00000000000000000000001 = 0.00000000000000000000001 x 2^(-126)
     ^                               ^ 
     |                               |
denormalized number -----------------+

= 1 x 2^(-126-23) = 1 x 2^(-149)

Def: overflow is the largest representable number

Example: In the IEE754 Single Precision standard, this is

  0[11111110]11111111111111111111111 
= +1.11111111111111111111111 x 2^(127)
= overflow

-----------------------------------------------------------
Classes of Numbers
-----------------------------------------------------------

0[00000000]00000000000000000000000 = 0

0[11111111]00000000000000000000000 = Infinity

0[11111111]mmmmmmmmmmmmmmmmmmmmmmm = NaN (Not a Number)
 where mmmmmmmmmmmmmmmmmmmmmmm != 00000000000000000000000

0[00000000]mmmmmmmmmmmmmmmmmmmmmmm = Denormalized Numbers
 where mmmmmmmmmmmmmmmmmmmmmmm != 00000000000000000000000

0[eeeeeeee]mmmmmmmmmmmmmmmmmmmmmmm = Normalized Numbers
 where eeeeeeee!=00000000 and eeeeeeee!=11111111

-----------------------------------------------------------
Denormalized numbers
-----------------------------------------------------------

In IEEE754, if the exponent is 0000000, 
1) the leading hidden digit is 0 (not 1)
2) switch to excess 126

Example:
0[00000010]00000000000000000000000 = 1.00000000000000000000000 x 2^(-125)
                                   = 1.00000000000000000000000 x 2^(-125)
divide by 2...
0[00000001]00000000000000000000000 = 1.00000000000000000000000 x 2^(-126)
                                   = 1.00000000000000000000000 x 2^(-126)
divide by 2...
0[00000000]10000000000000000000000 = 0.10000000000000000000000 x 2^(-126)
                                   = 1.0000000000000000000000  x 2^(-127)
divide by 2...
0[00000000]01000000000000000000000 = 0.01000000000000000000000 x 2^(-126)
                                   = 1.000000000000000000000   x 2^(-128)
                                                            ^^
                                                            ||
Note: There is a loss of precision -------------------------++

so the smallest denormalized number is 

divide by 2...
0[00000000]00000000000000000000001 = 0.00000000000000000000001 x 2^(-126)
                                   = 1.                        x 2^(-149)
-----------------------------------------------------------
So what!!
-----------------------------------------------------------
The issues:

1) Can't represent nice base 10 numbers exactly. (Error introduced here)

2) When performing arithmetic between two representable numbers, 
   may not be able to exactly represent the result (Error introduced here)

3) Based on the above, comparing floating point numbers is problematic.
   At best compare via a range of numbers. 

   Example: x==0.0f vs -.0001 <= x && x <= .0001

4) The order in which the calculation is done matters, different order
   could mean different resulting value. Best to calculate with 
   similar values.

   Example: in IEEE 754 (done left to right)

   1.0...0 x 2^(0) + 1.0...0 x 2^(-100) - 1.0..0 x 2^(0)    = 0
   1.0...0 x 2^(0) - 1.0...0 x 2^(0)    + 1.0..0 x 2^(-100) = 1.0..0 x 2^(-100)

5) Values may underflow or overflow during calculation, leading to 
   incorrect values. Again, the order of calculation matters.

6) Consequences!!
   http://www.ima.umn.edu/~arnold/disasters/patriot.html
   http://www.ima.umn.edu/~arnold/disasters/sleipner.html
   https://itsfoss.com/a-floating-point-error-that-caused-a-damage-worth-half-a-billion/
   https://web.ma.utexas.edu/users/arbogast/misc/disasters.html


-----------------------------------------------------------
Some References
-----------------------------------------------------------
https://web.archive.org/web/20081212190209/https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/float.html