|
|
|
|
|
by adrian_b
1678 days ago
|
|
I agree that this happens, but solving such cases with FTZ is a lazy solution, which is guaranteed to give bad results, due to the loss of precision. Even when 32-bit floating point is used, for a greater dynamic range and for a 24-bit precision, instead of using 16-bit fixed-point numbers, proper DSP algorithm implementations still need to use some of the techniques that are necessary with fixed-point number algorithms, i.e. suitable scale factors must be inserted in various places. A correct implementation must avoid almost all underflows and overflows, by appropriate scalings. |
|