Hacker News new | ask | show | jobs
by xoranth 736 days ago
It is the same reason in software sometimes you batch operations:

When you add two numbers, the GPU needs to do a lot more stuff besides the addition.

If you implemented SIMT by having multiple cores, you would need to do the extra stuff once per core, so you wouldn't save power (and you have a fixed power budget). With SIMD, you get $NUM_LANES additions, but you do the extra stuff only once, saving power.

(See this article by OP, which goes into more details: https://yosefk.com/blog/its-done-in-hardware-so-its-cheap.ht... )