Hacker News new | ask | show | jobs
by andhow 4614 days ago
Referring to http://agner.org/optimize, both Nehalem (Intel Core i7) and Jaguar (AMD Kabini) instruction tables: you are right that addss, subss show the same latency as addsd and subsd, resp. However, mulss and divss show better latencies than mulsd and divsd.
1 comments

But if your code vectorizes, can't you get twice the throughput on sections?
Agreed, I was just replying to the OP that, even on desktop using scalar ops, there are advantages to single vs. double precision ops.