Hacker News new | ask | show | jobs
by ffast-math 1458 days ago
IMO it would be super cool and I hope someone does it. There are a lot of interesting tradeoffs around which techniques to use for which matrix sizes and under which assumptions about read vs write ratios, what you have a training set for, whether you can fuse compression intro previous ops, etc.
1 comments

hm... so maybe a better place would be one of these toolkits like jax where the entire computation is known at optimization time where a blas would potentially have to do some heroic heuristics to try and fully optimize underneath the blas interface.