Hacker News new | ask | show | jobs
by jnovek 3 days ago
LLMs will absolutely be like that. The speed this technology is moving at makes me certain, especially over a period of 10-20 years; 20 years ago I was bugging friends for a GMail invite and AI was a joke left to academics.

I think it will even be solved soon, like, within the next 18 to 36 months. Hallucinations are the biggest problem consumers have with LLMs and a solution to that would be instantly worth billions of dollars. I’m sure every company in this space is desperately trying to figure it out before everyone else.

A non-deterministic system will always make mistakes, but we’ll hit a target where LLMs make fewer mistakes than humans and that will be good enough for almost all applications.

2 comments

I don't know if you can "fix" hallucinations without changing the fundamental architecture. The other factor in this article is that prior to the AI summary at the top, Google could simply state that it was an error on the part of the website owner. Now it is being held liable for whatever the summary states - even if it's more accurate, it can still be wrong enough times to be expensive.
> I don't know if you can "fix" hallucinations without changing the fundamental architecture.

Exactly. "hallucinations" are not some special case. They are the LLM working as designed.

They don’t need to be eliminated or “fixed”, they just need to be less frequent than human error.
This is disregarding the entire mechanism by which LLMs work. How close to this ideal are the current frontier level ai is now? If you do a cost/improvement analysis does it look like it can reach a usable threshold?

I don’t know the numbers but as user, it seems impossible for it to be useful without expert review. It is also debatable if it brings any value when you consider the cost of building and using LLMs and the time of expert. Also need to include the opportunity cost the expert is spending on reviewing slop instead of creating work themselves and the long term consequences of this on the expert himself

Hallucination rates go down by a few % with each new model generation, although some milestones have seen regression e.g. reasoning models are just overall worse. It is a bit hard to measure and compare across generations because the tests have to change in order to stay ahead of training data but they are generally improving. Check out HaluHard, it’s a stress test hallucination benchmark.

https://halluhard.com/

There will never be a time where there are zero hallucinations because of the non-determinist nature of LLMs, but eventually the frequency of hallucinations will be so low that it doesn’t matter. If the robot makes one mistake for every ten a human makes, that’s coming out ahead (depending on the nature of the mistake, of course).

Also I’m not making any value judgement about the technology and how it’s used and, frankly, I don’t really appreciate the assumption that I am. I’m fucking job hunting right now and it’s a hellscape thanks to LLMs.

I’m just being real about where the tech appears to be going based on its current trajectory and my experience in the industry. There seems to be a lot of cope going around that these things won’t ever be good enough to take our jobs. They will, and sooner than any of us are ready for. Leadership is fine with slop as long as it ships, the tech as it stands today doesn’t have to be much better to reach that standard.