|
|
|
|
|
by sigmoid10
329 days ago
|
|
You are describing the state of LLMs from 2 years ago. Which basically means they were just pre-trained on the internet and then fine tuned to follow a particular instruction format. Current models still use this as a first step, but are then trained a lot using reinforcement learning, which has given them much better skills at reasoning and logic than human tainted data ever could. See how Grok 4 for example still eagerly dismisses all those right wing hoaxes, despite being massively tuned to favour right wingers by its creators carefully selecting pre-training data. |
|