Hacker News new | ask | show | jobs
by vardump 828 days ago
It's great for hardware implementations, because it's simple and you get good/excellent accuracy. I wouldn't be surprised if that's still how modern x86-64 CPUs compute sin, cos, etc.

That said, last time I had to do that in software, I used Taylor series. Might not have been an optimal solution.

EDIT:

AMD's Zen 4 takes 50-200 cycles (latency) to compute sine. I think that strongly suggests AMD uses CORDIC. https://www.agner.org/optimize/instruction_tables.pdf page 130.

Same for Intel, Tiger Lake (Intel gen 11) has 60-120 cycles of latency. Page 353.

I'd guess usually ~50 cycles for Zen 4 (and ~60 for Intel) for float32, float64/float80 datatype. Denormals might also cost more cycles.

1 comments

They switched away from CORDIC at one point: https://www.intel.com/content/www/us/en/developer/articles/t...

(there doesn't seem to actually be a linked article there, just the summary)

Pretty weird Intel's sine computation latency hasn't changed all that much over the years. Latencies have been pretty similar for 20 years.

EDIT: That's a paper for a software library, not the CPU's internal implementation. Which is probably still done with CORDIC.

> EDIT: That's a paper for a software library, not the CPU's internal implementation.

Unless you're seeing something I'm not, it's talking about x87, which hasn't been anything other than 'internal' since they stopped selling the 80486sx.

Ah you're right.

Anyways I wonder why it's still so slow.

60-120 cycles sure looks like a CORDIC implementation, but perhaps not.