|
|
|
|
|
by kpw94
805 days ago
|
|
Great links,
especially last one referencing the Goto paper: https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/... >> I believe the trick with CPU math kernels is exploiting instruction level parallelism with fewer memory references It's the collection of tricks to minimize all sort of cache misses (L1, L2, TLB, page miss etc), improve register reuse, leverage SIMD instructions, transpose one of the matrices if it provides better spatial locality, etc. |
|