Limitations of 32-bit Floating Point Math in a SCADAPack
After adding a small number to a floating point registers many times, the stored total loses accuracy over time. What's going on?
NOTE: This scenario can also be seen in use with the FLOW or TOTL function block in TelePACE that is not logging data periodically and resetting the accumulator.
The SCADAPack controllers store real or floating point numbers in two 16-bit registers and store it as a 32-bit floating point according to IEEE-754.
Of the 32 bits, one is used for the sign, 8 are used for the exponent and 23 are used for the significand.
There are many handy online floating point converters online.
In Excel, where it doesn’t use 32 bit floating points to store real numbers, you have the ability to store many more digits accurately
For example in Excel:
If we look at this same calculation with 32 bit floating point math on the SCADAPack:
890.914954861 is stored as a 32 bit FP = 445E BA8F hex = 890.91497802734375 (dec) = 1.10111101011101010001111b9 (binary)
0.247545139 stored as a 32 bit FP = 3E7D 7C79 = 0.24754513800144185556640625 (dec) = 1.11111010111110001111001b-3 (binary)
At the controller level, the ADDF function handles the binary values but it needs to align the decimal points first.
1.11111010111110001111001b-3 (binary) = 0.00000000000111111010111110001111001 b9 which will then get truncated to 0.00000000000111111010111 b9.
Because the two numbers are several orders of magnitude apart, a lot of the precision of the smaller number gets lost in the controller.
= 891.1624755859375 in the controller which is a difference of about 0.000024 form the exact value seen above.
NOTE: These effects are magnified as the numbers being added together get further and further apart.
For example, assume we are counting the number of times a discrete event is occurring in a floating point value.
Eventually, we will get to a point where the floating point will no longer be able to continue to increment.
e.g. 1.00 + 1.00 = 2.00; 2.00 + 1.00 = 3.00, etc...
Let's look at 16,777,220 and we add 1
16,777,220 as a 32 bit FP = 4B80 0002 = 16,777,220 (dec) = 1.000000000000000000000100b24 (binary)
1 as a 32 bit FP = 3F80 0000 = 1.000 (dec) = 1.00000000000000000000000b0 (binary)
Again at the controller level, the ADDF function handles the binary values but it needs to align the decimal points first.
1.000 000 000 000 000 000 000 00 b0 (binary) = 0.000 000 000 000 000 000 000 001 b24 which will then get truncated to 0.000 000 000 000 000 000 000 00 b24 since there are only 23 bits available for the significand.
1.000 000 000 000 000 000 001 00 b24
1.000 000 000 000 000 000 001 00 b24 = 16,777,220
So in 32 bit FP math, 16,777,220 + 1 = 16,777,220 NO CHANGE!
Working within the limitations of 32-bit floating point math.
Time is broken up into logical divisions to make it more manageable: seconds, minutes, hours, days, months, years, etc.
The same will have to be done within the 32-bit FP environment.
In the example above, where a counter was being incremented by 1 at a time, it might be feasible to reset (or rollover) the main counter when it reaches one million and have a separate counter to record the number of rollovers (millions).
NOTE: The use of 1 million as the counter rollover may not be suitable in all applications. It will be dependent on the incremental additions and the precision required for your application.
When the primary counter passes 1 million:
NOTE: In the example where the totalizer is incremented by an integer value, it may be prefereable to handle the counters as an unsigned double integer. An unsigned double integer has the ability to go up to 4,294,967,295 (232-1)
- increment the secondary count by 1 to count the number of millions.
- reduce the primary counter by 1 million. This will help prevent a situation where a very small number is being added to a very large number