What are some good strategies to test a floating point arithmetic implementation for double numbers?

2021-11-28

An answer to this question on the Scientific Computing Stack Exchange.

Question

For IEEE, the single representation is 1-bit sign, 8-bit exponent and 23-bit mantissa. This means that at each exponent value, you can test all 2^23-1 (roughly 9mil cases) possible combination of binary representation (give or take). Then you do it for all exponent value (255 values), and you can basically cover all floating points represented by IEEE.

However, for double precision, such approach is not really viable. With 52-bit mantissa, at each value of exponent you would need to test 2^52-1 binary combinations (which is roughly 4 million bilion, ~E15).

This seems to suggest that you need some randomizing scheme to test that your arithmetic implementation is bounded with high probability. But do we know which scheme to use? Would it also be beneficial to consider how floating point numbers are distributed (i.e. more collocated around certain value/zero)?

Answer

You should test transition points.

Floating-point numbers have several distinct "ranges":

Standard/Normal arithmetic
Subnormal arithmetic
Infinite arithmetic
NaN arithmetic
Zero arithmetic

For instance, if I add any normal number to an infinite number, I need to get an infinite number back. If I add two large enough subnormals, I should get a normal number. Any math done on a NaN makes a NaN. Adding two large normals might get me an Inf.

So my testing strategy would be:

Randomly check a few instances of math where the answer stays within a class (note that operations which affect the exponent can be distinguished from changes that affect only the mantissa). If 1+2=3, then probably I've gotten 2+3=5 correct as well.
Spend much more time/effort checking math at the boundaries of classes, since these represent special cases.

I'd probably write a few unit tests to explore specific cases I understand well, but then use property-based testing to be more thorough. This works especially well with things like zero, inf, and NaN.

Finally, I'd measure code coverage to ensure that the test suite is hitting the entirety of the library.

Pre-existing test suites include:

Kahan's paranoia
Schryer's "A Test of a Computer’s Floating-Point Arithmetic Unit" (I haven't found source code for this)