Hacker News new | ask | show | jobs
by ozgrakkurt 3 days ago
This is disregarding the entire mechanism by which LLMs work. How close to this ideal are the current frontier level ai is now? If you do a cost/improvement analysis does it look like it can reach a usable threshold?

I don’t know the numbers but as user, it seems impossible for it to be useful without expert review. It is also debatable if it brings any value when you consider the cost of building and using LLMs and the time of expert. Also need to include the opportunity cost the expert is spending on reviewing slop instead of creating work themselves and the long term consequences of this on the expert himself

1 comments

Hallucination rates go down by a few % with each new model generation, although some milestones have seen regression e.g. reasoning models are just overall worse. It is a bit hard to measure and compare across generations because the tests have to change in order to stay ahead of training data but they are generally improving. Check out HaluHard, it’s a stress test hallucination benchmark.

https://halluhard.com/

There will never be a time where there are zero hallucinations because of the non-determinist nature of LLMs, but eventually the frequency of hallucinations will be so low that it doesn’t matter. If the robot makes one mistake for every ten a human makes, that’s coming out ahead (depending on the nature of the mistake, of course).

Also I’m not making any value judgement about the technology and how it’s used and, frankly, I don’t really appreciate the assumption that I am. I’m fucking job hunting right now and it’s a hellscape thanks to LLMs.

I’m just being real about where the tech appears to be going based on its current trajectory and my experience in the industry. There seems to be a lot of cope going around that these things won’t ever be good enough to take our jobs. They will, and sooner than any of us are ready for. Leadership is fine with slop as long as it ships, the tech as it stands today doesn’t have to be much better to reach that standard.