|
|
|
|
|
by vkkhare
397 days ago
|
|
Isn't the inference cost of running these models at scale challenging? Currently it feels like small LLMs (1B-4B) are able to perform well for simpler agentic workfows. There are definitely some constraints but surely much easier than to pay for big clusters on cloud running for these tasks. I believe it distributes the cost more uniformly |
|
We'll see companies push for tiny on-device models as a novelty, but even the best of those aren't very good. I firmly believe that GPUs are going to stay relevant even as models scale down, since they're still the fastest and most power-efficient solution.