Hacker News new | ask | show | jobs
by sigmoid10 329 days ago
You are describing the state of LLMs from 2 years ago. Which basically means they were just pre-trained on the internet and then fine tuned to follow a particular instruction format. Current models still use this as a first step, but are then trained a lot using reinforcement learning, which has given them much better skills at reasoning and logic than human tainted data ever could. See how Grok 4 for example still eagerly dismisses all those right wing hoaxes, despite being massively tuned to favour right wingers by its creators carefully selecting pre-training data.
1 comments

You have some sort of very confused idea of what reinforcement learning is. (Which is probably why you're being downvoted.)
I suggest you reed something like the DeepSeek R1 paper, because you and everybody else here seems to have no clue how it works (which is not surprising tbh).