Hacker News new | ask | show | jobs
by tekne 47 days ago
Wait... why?

Making an unreliable, nondeterministic system give reliable results for a bounded task with well-understood parameters is... like half of engineering, no?

There's a huge difference between "generate this code here's a vague feature description" and "here's a list of criteria, assign this input to one of these buckets" -- the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!

2 comments

>the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!

...which is why we write deterministic code to take the human out of the pipeline. One of the early uses of computers was calculating firing tables for artillery, to replace teams of humans that were doing the calculations by hand (and usually with multiple humans performing each calculation to catch errors). If early computers had a 99% chance of hallucinating the wrong answer to an artillery firing table, the response from the governments and militaries that used them would not be to keep using computers to calculate them. It would be to go back to having humans do it with lots of manual verification steps and duplicated work to be sure of the results.

If you're trying to make LLMs (a vague simulacrum of humans) with their inherent and unsolvable[1] hallucination problems replace deterministic systems, people are going to eventually decide to return to the tried and true deterministic systems.

1: https://arxiv.org/abs/2401.11817

So how did we deal with the human mistakes? You mentioned it:

- Get humans to check each other's work

- Systematize the process -- breaking it down into smaller and smaller tasks where the likelihood of mistakes decreases

- Replace as much as possible with deterministic code

There's absolutely no reason you can't do this with LLMs -- and it might help quite a bit since LLMs are cheap. There's also hybrid systems -- where human checkers are replaced or augmented with LLM checkers.

For example -- I have an LLM check all my scientific papers for typos and minor errors. It's caught quite a few, and when it caught something that was not actually an error, it was usually something whuch would benefit from clarification anyways.

Now -- if I could afford to pay a grad student to do that, would be even better! But I can't, and if I could, not all the work which warrants a few cents of tokens warrants a few hundred dollars of tedious grad student labor -- especially when the latter has a very strong incentive to say LGTM (nothing here is life critical!)

Likewise, we could imagine:

- A deterministic process with a heuristic + an LLM in the loop checking, for example -- "is this likely correct?" -- perhaps escalating to a human (or a bigger LLM) in case of anomaly. I can see this being amazingly useful for automated refactors.

- Automatic paperwork/customer service processing -- if the cost-of-failure can be bounded (say X$) and testing shows failure happens on average only reasonably often (say Y% of the time) -- it might be cheaper to run an AI system and eat that cost, especially if continuous monitoring lets you know if you have to "shut it down."

In both cases -- there's nothing stopping an LLM from potentially having better-than-human average performance, and perhaps delegating real edge cases to actual experts. Remember: you're not competing with motivated PhDs, you're competing with minimum wage labor reading a list of instructions which is like a prompt except poorly formatted and missing steps.

Because it's not possible. There is nothing you can say to the LLM that will guarantee that something happens. It's not how it works. It will maybe be taken into consideration if you're lucky.

But if you're trying to tell me that every time you list criteria you get them all perfectly matched, you're clearly gifted.

I'm being deliberately pedantic, but depending on what kind of representation we use for the neural network (due to rounding) as well as the choice of inference (that is, given a distribution for next token, which one to choose), it can absolutely be reproducible and completely deterministic.

Though chaotic, which I believe is the better word here - a single letter change may result in widely different results.

We just choose to use more random inference rules, because they have better results.

With determinism you're not wrong. The problem is that you'd need to make sure all your seeds, temperatures, and other input parameters are exactly the same, and importantly that all context is cleared. But people don't do that. And I'm not sure every if even any provider lets you set those parameters.
Even with temperature set to zero, I believe due to FP operations not being commutative you may still get non-determinism, so what I am talking about (as mentioned, very pedantically) is mostly the theory.
"There is nothing you can say to the person that will guarantee that something happens"