Hacker News new | ask | show | jobs
by semessier 209 days ago
without having implemented inference, just by looking at it from a math perspective this is base linear algebra/BLAS. I am very much wondering what a lean inference optimized API with covering 80% of all use cases across dtypes and sparsity would look like. Probably a far cry from what's in CUDA and probably all that's needed for practical inference.