Y
Hacker News
new
|
ask
|
show
|
jobs
by
DevelopingElk
193 days ago
RL before LLMs can very much learn new behaviors. Take a look at AlphaGo for that. It can also learn to drive in simulated environments. RL in LLMs is not learning the same way, so it can't create it's own behaviors.