Hacker News new | ask | show | jobs
by ribit 850 days ago
Thank you, very insightful and makes perfect sense! I do wonder however why Nvidia and Intel chose not to expose an AXPY/outer product instruction if they use these kinds of operations under the hood. I can imagine them being useful in their own right. My best guess is that this gives them freedom to change the implementation details later on (e.g. the order of swizzles)?