|
|
|
|
|
by imtringued
1072 days ago
|
|
The fact that each cuda core also has its own instruction pointer is quite misleading. You would think that this lets it run different instructions per cuda core but the opposite is the case. The driver uses these instruction pointers for finer scheduling granularity. That is cool but is not the same. https://stackoverflow.com/questions/58071834/why-does-each-t... |
|
NVIDIA has never given a good explanation about what they mean by the "instruction pointer" that belongs to each NVIDIA "thread". It certainly does not mean what in means normally, i.e. a special register that contains the address from where the next instruction will be fetched for execution. I believe that this "instruction pointer" refers to a register where the actual instruction pointer is saved when a "thread" is stalled because it has diverged into two branches after a condition test and only one of the branches continues to be executed, while the other branch must be executed later, with the complementary predicate.
These saved instruction pointers are presumably used for scheduling the "threads" to be executed by the SIMD lanes provided by the hardware, in such a way as to satisfy the cross-lane dependencies.