| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dsign 316 days ago

This is true.

But I've seen some harnesses (i.e., whatever Gemini Pro uses) do impressive things. The way I model it is like this: an LLM, like a person, has a chance to produce wrong output. A quorum of people and some experiments/study usually arrives to a "less wrong" answer. The same can be done with an LLM, and to an extent, is being done by things like Gemini Pro and o3 and their agentic "eyes" and "arms". As the price of hardware and compute goes down (if it does, which is a big "if"), harnesses will become better by being able to deploy more computation, even if the LLM models themselves remain at their current level.

Here's an example: there is a certain kind of work we haven't quite yet figured how to have LLMs do: creating frameworks and sticking to them, e.g. creating and structuring a codebase in a consistent way. But, in theory, if one could have 10 instances of an LLM "discuss" if a function in code conforms to an agreed convention, well, that would solve that problem.

There are also avenues of improvement that open with more computation. Namely, today we use "one-shot" models... you train them, then you use them many times. But the structure, the weights of the model aren't being retrained on the output of their actions. Doing that in a per-model-instance basis is also a matter of having sufficient computation at some affordable price. Doing that in a per-model basis is practical already today, the only limitation are legal terms, NDAs, and regulation.

I say all of this objectively. I don't like where this is going; I think this is going to take us to a wild world where most things are gonna be way tougher for us humans. But I don't want to (be forced to) enter that world wearing rosy lenses.