| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sigmoid10 376 days ago
	You are describing the state of LLMs from 2 years ago. Which basically means they were just pre-trained on the internet and then fine tuned to follow a particular instruction format. Current models still use this as a first step, but are then trained a lot using reinforcement learning, which has given them much better skills at reasoning and logic than human tainted data ever could. See how Grok 4 for example still eagerly dismisses all those right wing hoaxes, despite being massively tuned to favour right wingers by its creators carefully selecting pre-training data.

1 comments

otabdeveloper4 376 days ago

You have some sort of very confused idea of what reinforcement learning is. (Which is probably why you're being downvoted.)

link

sigmoid10 375 days ago

I suggest you reed something like the DeepSeek R1 paper, because you and everybody else here seems to have no clue how it works (which is not surprising tbh).

link