|
|
|
|
|
by ribit
850 days ago
|
|
Thank you, very insightful and makes perfect sense! I do wonder however why Nvidia and Intel chose not to expose an AXPY/outer product instruction if they use these kinds of operations under the hood. I can imagine them being useful in their own right. My best guess is that this gives them freedom to change the implementation details later on (e.g. the order of swizzles)? |
|