Hacker News new | ask | show | jobs
by cmrdporcupine 3 hours ago
I think the model they chose is out of date and hard to sell, but there are plenty of use cases where today's dumb small models are fine. A Qwen 3.5/3.6 or Gemma 3 model on silicon at those speeds would be genuinely world changing even if it's only 1-3B params. Such a model at those speeds will remain extremely useful even over a 5-6 year timespan, I think.

If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)

1 comments

I wouldn't be surprised if "fast, cheap, dumb" end us being the market for LLMs.

The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.

On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.

> On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry

This is a lane I’ve been experimenting in —- seeing what I can get out of models that work in 16GB VRAM for simple tasks (screen scraping, decision tree navigation, natural language queries). It’s interesting for sure (certainly reveals non-deterministic limits) and promising for low criticality review-opportunity tasks, but I also feel like I need better sources/community for understanding and reflection. Preferably those that aren’t hype channels. Any pointers?