Hacker News new | ask | show | jobs
by zozbot234 1521 days ago
> You might end up masking part of the cost of some stalls if you are able to swap in other ready-to-run tasks.

You'd need SMT to do this for memory stalls, and Apple M1 doesn't use SMT - they have the same amount of logical cores (hardware threads) and physical cores.

2 comments

Source? Every unified programmable GPU I've seen uses SMT, including the PowerVR GPUs going back to the SGX days. It's core to how they approach modern memory hierarchies.
Looking into it more, AGX2 (like pretty much every fairly high perf modern GPU) is heavily SMT, allowing up to 1024 simultaneous threads per core depending on how many registers each shader invocation needs.

https://rosenzweig.io/blog/asahi-gpu-part-3.html