| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andhow 4614 days ago
	Referring to http://agner.org/optimize, both Nehalem (Intel Core i7) and Jaguar (AMD Kabini) instruction tables: you are right that addss, subss show the same latency as addsd and subsd, resp. However, mulss and divss show better latencies than mulsd and divsd.

1 comments

But if your code vectorizes, can't you get twice the throughput on sections?

Agreed, I was just replying to the OP that, even on desktop using scalar ops, there are advantages to single vs. double precision ops.