Hacker News new | ask | show | jobs
by JCTheDenthog 47 days ago
>the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!

...which is why we write deterministic code to take the human out of the pipeline. One of the early uses of computers was calculating firing tables for artillery, to replace teams of humans that were doing the calculations by hand (and usually with multiple humans performing each calculation to catch errors). If early computers had a 99% chance of hallucinating the wrong answer to an artillery firing table, the response from the governments and militaries that used them would not be to keep using computers to calculate them. It would be to go back to having humans do it with lots of manual verification steps and duplicated work to be sure of the results.

If you're trying to make LLMs (a vague simulacrum of humans) with their inherent and unsolvable[1] hallucination problems replace deterministic systems, people are going to eventually decide to return to the tried and true deterministic systems.

1: https://arxiv.org/abs/2401.11817

1 comments

So how did we deal with the human mistakes? You mentioned it:

- Get humans to check each other's work

- Systematize the process -- breaking it down into smaller and smaller tasks where the likelihood of mistakes decreases

- Replace as much as possible with deterministic code

There's absolutely no reason you can't do this with LLMs -- and it might help quite a bit since LLMs are cheap. There's also hybrid systems -- where human checkers are replaced or augmented with LLM checkers.

For example -- I have an LLM check all my scientific papers for typos and minor errors. It's caught quite a few, and when it caught something that was not actually an error, it was usually something whuch would benefit from clarification anyways.

Now -- if I could afford to pay a grad student to do that, would be even better! But I can't, and if I could, not all the work which warrants a few cents of tokens warrants a few hundred dollars of tedious grad student labor -- especially when the latter has a very strong incentive to say LGTM (nothing here is life critical!)

Likewise, we could imagine:

- A deterministic process with a heuristic + an LLM in the loop checking, for example -- "is this likely correct?" -- perhaps escalating to a human (or a bigger LLM) in case of anomaly. I can see this being amazingly useful for automated refactors.

- Automatic paperwork/customer service processing -- if the cost-of-failure can be bounded (say X$) and testing shows failure happens on average only reasonably often (say Y% of the time) -- it might be cheaper to run an AI system and eat that cost, especially if continuous monitoring lets you know if you have to "shut it down."

In both cases -- there's nothing stopping an LLM from potentially having better-than-human average performance, and perhaps delegating real edge cases to actual experts. Remember: you're not competing with motivated PhDs, you're competing with minimum wage labor reading a list of instructions which is like a prompt except poorly formatted and missing steps.