|
|
|
|
|
by thegeomaster
1154 days ago
|
|
One important point that the article doesn't touch on is determinism. rsqrtps is implemented differently on different CPUs, and so if having reproducible floating point results is a requirement (there's a lot of use-cases for this), you simply cannot use it. Your only remaining option is to use a totally IEEE-754 compliant algorithm that is guaranteed to work the same on every CPU that implements IEEE-754 floats, and for that there's still no better approach than using the Q_rsqrt idea, of course with some modifications for the "modern age". |
|
That applies to x86 for sure. But the designers of ARM and RISC-V had the foresight to standardise the implementation of rsqrt to make it deterministic ... on each respective platform. But on either, the precision is only 7 bits.
Recent AMD and Intel x86-64 processors use 11 input bits, and the results are similar enough that only 22 results over the whole input range are different. Source: <https://robert.ocallahan.org/2021/09/emulating-amd-rsqrtss-e...>