|
|
|
|
|
by 6keZbCECT2uB
1091 days ago
|
|
The tensor core accelerates mostly matrix operations and is the big block you can see has 4 per SM. Cuda core refers to the thread per SM, which you can see as FP32 or INT32 units, so there are (32*4) per SM on that diagram. Like you said, tensor core is similar to a special purpose ALU and is at a lower level of abstraction than something with an instruction pointer. |
|