|
|
|
|
|
by brrrrrm
1327 days ago
|
|
Yep! This is an assumed optimization when it comes to modern linear algebra compilers. New primitives go way beyond FMAs: full matrix multiplies on nvidia/Intel and outer product accumulates on Apple silicon. It’s also expected that these are used nearly optimally (or you’ve got a bug). |
|