| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gajjanag 62 days ago

The bigger challenge is GPU/NPU. Branches for fast vs accurate path get costlier, among other things. On CPU this is less of a cost.

Most published libm on GPU/NPU side have a few ULP of error for the perf vs accuracy tradeoff. Eg, documented explicitly in the CUDA programming guide: https://docs.nvidia.com/cuda/cuda-programming-guide/05-appen... .

Prof. Zimmermann and collaborators have a great table at https://members.loria.fr/PZimmermann/papers/accuracy.pdf (Feb 2026) comparing various libm wrt accuracy.