|
|
|
|
|
by semessier
209 days ago
|
|
without having implemented inference, just by looking at it from a math perspective this is base linear algebra/BLAS. I am very much wondering what a lean inference optimized API with covering 80% of all use cases across dtypes and sparsity would look like. Probably a far cry from what's in CUDA and probably all that's needed for practical inference. |
|