Hacker News new | ask | show | jobs
by shaklee3 2948 days ago
I've found where that causes problems is when on of the matrix dimensions is not a multiple of the cache line size. It's common on gpus to use more elements then there are in the dimension. Nvidia calls this the leading dimension, and it must be greater than or equal to the Matrix dimension. If this is the case, the transpose trick doesn't quite work anymore.