| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 323 1419 days ago
	But the question was how many execution ports it has, not what's the latency/throughput of one port.

1 comments

corsix 1419 days ago

Given latency 3 / throughput 1, the only reasonable implementations are: A) Three ports, each non-pipelined, taking 3 cycles B) One port, with three-cycle pipeline (each cycle, one instruction can enter the start of the pipeline, and anything in-progress moves forward one stage) Given that CRC isn't too hard to pipeline, and (B) requires less physical hardware, it is almost certainly (B).

link

MrFuchs 1419 days ago

From the port usage reported at https://uops.info/html-instr/CRC32_R64_R64.html, we can conclude (B) for the Intel microarchitectures. For AMD it's not entirely obvious, but I agree (B) appears more likely.

link