Hacker News new | ask | show | jobs
by samantha-wiki 498 days ago
I strongly disagree with this. “Computers” would have not been replaced with the machines that replaced them if those machines routinely produced incorrect results.

One could argue that for applications where correctness is not critical my position does not apply, however this is not the analogy that the article is making.

2 comments

The trajectory of LLMs "routinely producing incorrect results" is heading downwards as we are getting more advanced reasoning models with test-time compute.

I don't know whether you used some of the more recent models like Claude 3.5 Sonnet and o1. But to me it is very clear where the trajectory is headed. o3 is just around the corner, and o4 is currently in training.

People found value even in a model like GPT 3.5 Turbo, and that thing was really bad. But hey, at least it could write some short scripts and boilerplate code.

You are also comparing mathematical computation - which has only 1 correct solution - with programming, where the solution space is much broader. There are multiple valid solutions. Some are more optimal than others. It is up to the human to evaluate that solution, as I've said in the post. Today, you may even need to fix the LLM's output. But in my experience, I'm finding I need to do this far less often than before.

Wait what? Human programmers produce incorrect results all the time, they are called bugs. If anything, we use automated systems when correctness is important - fuzzers, static analyzers, etc. And the "AI" systems are improving by leaps and bounds every month, look at SWE-Bench [1] for example. It's pretty obvious where this is all going.

[1] https://www.swebench.com/

Sure, people make mistakes all the time. But would you prefer those mistakes be sprinkled randomly throughout your data crunching, or be systematic errors?

The point that that post is making is that a machine isn't going to make a mistake in adding two numbers. It reduces arithmetic errors to 0 (unless you count overflow which can be detected), and if it didn't it would only be useful in the rare case you don't care about accuracy.

AI in it's current state does not do for logical accuracy what computers did for arithmetic accuracy; You still need to verify every output from an LLM, which I doubt you've done for the many billions of arithmetic operations that happened this second on the computer you're on right now.

edit: fixed typo

Human computers vs. electronic computers.

Think of electronic calculators. They have a significantly lower error rate than human calculators. Both statistically significant and practically significant.