| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SR2Z 60 days ago

> The chat interface is all that there is and their behavior in chat is not deterministic or bounded enough to be useful in most applications.

Their behavior in chat is not deterministic, it's stochastic. That is the point - the usefulness of LLMs comes from their ability to deal with the vagaries of language.

> But because the LLM cannot be trusted to behave properly around the topic, they have to filter anything which touches it.

IMO this is because giving a random person a frontier LLM is like giving them a Ferrari. Most people would manage to not crash it. A few would experiment with it and learn how to drive it very well. A few more would immediately assume that a fast car means they can drive it fast and end up wrapped around a telephone pole.

We get lots of mileage out of other stochastic systems. I've worked on a lot of projects that did, and the defining trait that made them successful doesn't seem to have a name but the closest I can come up with is "boosting." In ML (esp. in classical ML), boosting is when you train one classifier to predict the residual error of another. The first classifier minimizes some entropy loss, and then the second contributes additional bits.

In a system with a human-in-the-loop, it often takes a lot of engineering to allow the human to boost the output of a system. I once worked for a company where we had to very precisely label maps based on real-world data. We had a model that could produce a sometimes-accurate polygon, but obviously just asking a person to adjust the polygon after the model generated it was terrible because that was a vague ask that took a lot of time and effort to do. Instead, we gave users a brush tool and trained a new model to fix the polygon based on that. A simpler example was a system for reviewing user reports: we tuned our system to approve them with high precision and used a human review queue for the rest. Reducing the number of bits of entropy a human being had to contribute to a decision in the average case allowed us to smoothly iterate on the model while staying flexible.

The AI companies that actually going to deliver useful products will be the ones that engineer interfaces that quickly allow human beings to refine LLM outputs. It's going to be a long time before any of these models can reliably one-shot a complex task with ambiguous parameters. Chat is only one possible way to do this, and frankly it's not a very good one. I think that this is the point the article was trying to make, minus the corpspeak and hype.