| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 6keZbCECT2uB 1091 days ago
	The tensor core accelerates mostly matrix operations and is the big block you can see has 4 per SM. Cuda core refers to the thread per SM, which you can see as FP32 or INT32 units, so there are (32*4) per SM on that diagram. Like you said, tensor core is similar to a special purpose ALU and is at a lower level of abstraction than something with an instruction pointer.