|
|
|
|
|
by SeanLuke
1677 days ago
|
|
The other examples he gave trade off significant math deficiencies for small speed gains. But flushing subnormals to zero can produce a MASSIVE speed gain. Like 1000x. And including subnormals isn't necessarily good floating point practice -- they were rather controversial during the development of IEEE 754 as I understand it. The tradeoff here is markedly different than in the other cases. |
|
For example Zen CPUs have negligible penalties for handling denormals, but many Intel models have a penalty between 100 and 200 clock cycles for an operation with denormals.
Even on the CPU models with slow denormal processing, a speedup between 100 and 1000 exists only for the operation with denormals itself and only when the operation belonged to a stream of operations working at the maximum CPU SIMD speed, when during the one hundred and something lost clock cycles the CPU could have done 4 or 8 operations during every clock cycle.
Any complete computations cannot have a significant percentage of operations with denormals, unless they are written in an extremely bad way.
So for a complete computation, even on the models with bad denormal handling, a speedup of more than a few times would be abnormal.
The only controversy that has ever existed about denormals is that handling them at full speed increases the cost of the FPU, so lazy or greedy companies, i.e. mainly Intel, have preferred to add the flush-to-zero option for gamers, instead of designing the FPU in the right way.
When the correctness of the results is not important, like in many graphic or machine-learning applications, using flush-to-zero is OK, otherwise it is not.