| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by semessier 256 days ago
	without having implemented inference, just by looking at it from a math perspective this is base linear algebra/BLAS. I am very much wondering what a lean inference optimized API with covering 80% of all use cases across dtypes and sparsity would look like. Probably a far cry from what's in CUDA and probably all that's needed for practical inference.