Hacker News new | ask | show | jobs
by calaphos 1062 days ago
The cuda cores of Nvidia GPUs are closer to fp32 units in vector ALUs than CPU cores capable of operating independently in parallel. Following that definition a modern CPU core would have dozens of "cuda cores" as well (although far fewer than GPUs optimized for that kind of workload)

More comparable would be the ~130 streaming Multiprocessors of a H100.

1 comments

The fact that each cuda core also has its own instruction pointer is quite misleading. You would think that this lets it run different instructions per cuda core but the opposite is the case. The driver uses these instruction pointers for finer scheduling granularity. That is cool but is not the same.

https://stackoverflow.com/questions/58071834/why-does-each-t...

In the NVIDIA peculiar language, which seems to have been chosen with the only purpose of causing confusion, because all the names invented by NVIDIA are applied to things that already had traditional names for many decades, warp means thread and thread means SIMD lane.

NVIDIA has never given a good explanation about what they mean by the "instruction pointer" that belongs to each NVIDIA "thread". It certainly does not mean what in means normally, i.e. a special register that contains the address from where the next instruction will be fetched for execution. I believe that this "instruction pointer" refers to a register where the actual instruction pointer is saved when a "thread" is stalled because it has diverged into two branches after a condition test and only one of the branches continues to be executed, while the other branch must be executed later, with the complementary predicate.

These saved instruction pointers are presumably used for scheduling the "threads" to be executed by the SIMD lanes provided by the hardware, in such a way as to satisfy the cross-lane dependencies.