|
|
|
|
|
by imtringued
260 days ago
|
|
Whenever I see code like this, I'm starting to think that GPUs are uniquely unsuited for matrix multiplication. You're pretending that each streaming multiprocessor can handle independent threads, when in reality you're feeding something that only exists once or twice per SM. It's like independently controlling one out of 32 cars on a 32 lane highway where the cars aren't allowed to switch lanes and having the controls on one car replicated to all the others when in reality everyone is sitting in the same bus. |
|