| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 1568 days ago
	> You might end up masking part of the cost of some stalls if you are able to swap in other ready-to-run tasks. You'd need SMT to do this for memory stalls, and Apple M1 doesn't use SMT - they have the same amount of logical cores (hardware threads) and physical cores.

2 comments

monocasa 1567 days ago

Source? Every unified programmable GPU I've seen uses SMT, including the PowerVR GPUs going back to the SGX days. It's core to how they approach modern memory hierarchies.

link

monocasa 1567 days ago

Looking into it more, AGX2 (like pretty much every fairly high perf modern GPU) is heavily SMT, allowing up to 1024 simultaneous threads per core depending on how many registers each shader invocation needs.

https://rosenzweig.io/blog/asahi-gpu-part-3.html

link