|
|
|
|
|
by InvidFlower
468 days ago
|
|
Well, not sure if that part matters as much (from first principles). But the more important part being that RL lets a model figure out which methods are effective for it. Most of the time it probably has the tools already from pre-training, but doesn't "make the connection" to use them (or at least not often enough). |
|