|
|
|
|
|
by CuriouslyC
130 days ago
|
|
That's pre-training. Post training with RL can make models arbitrarily good at specific capabilities, and it's usually done via pooled human experts, so it's definitely not statistically mediocre. The issue is that we're not modelling the problem, but a proxy for the problem. RL doesn't generalize very well as is, when you apply it to a loose proxy measure you get the abysmal data efficiency we see with LLMs. We might be able to brute-force "AGI" but we'd certainly do better with something more direct that generalizes better. |
|