|
|
|
|
|
by fnqi8ckfek
537 days ago
|
|
I don't buy it. LLMs can already put together long phrases without needing RL for training. And crucially those long phrases _make sense_ they're not use syntactically correct, which is what you'd expect by learning to predict the next word. So clearly it's possible to get lond correlations Right even without RL. |
|