Hacker News new | ask | show | jobs
by hasperdi 126 days ago
Why distill, if you can run the full model yourself... or at other inference providers.

Quantization the better approach in most cases, unless you want to for instance create hybrid models ie. distilling from here and there.