Hacker News new | ask | show | jobs
by fnqi8ckfek 537 days ago
I don't buy it. LLMs can already put together long phrases without needing RL for training. And crucially those long phrases _make sense_ they're not use syntactically correct, which is what you'd expect by learning to predict the next word.

So clearly it's possible to get lond correlations Right even without RL.