Hacker News new | ask | show | jobs
by sk11001 918 days ago
This isn't how LLMs work - they're trained to be helpful and give correct responses. You could argue to what extent that's achieved but the point is that it's not a weighted average of the internet, it's been fine-tuned towards correctness and helpfulness.
2 comments

I thought one of the biggest arguments against them was the fact that they inherently aren't good at being correct? They just produce something that looks like a human wrote it real good.
> They just produce something that looks like a human wrote it real good.

That's what the base model does. To get an LLM assistant there are additional training phases to make it conversational, helpful and more correct - https://www.youtube.com/watch?v=bZQun8Y4L2A

Sure… and so far there always appears to be a way of breaking that fine tuning; see the the recent paper on training data extraction I linked in another comment below.
There's a big difference between being breakable and being representative of the web content used for training like you claimed earlier.
Again, see the link… get it to repeat the same word and it will give you back its raw training data. We’re still discovering the (potentially limitless) ways these things can be tricked into regurgitating what they were trained on; it’s entirely possible there’s no way to stop them doing so.