Hacker News new | ask | show | jobs
by zozbot234 76 days ago
> These models are dumber and slower than API SoTA models and will always be.

Sure but you're paying per-token costs on the SoTA models that are roughly an order of magnitude higher than third-party inference on the locally available models. So when you account for per-token cost, the math skews the other way.