|
|
|
|
|
by cma
424 days ago
|
|
I'm pretty sure RL causes catastrophic forgetting of its base knowledge and that's why o3 hallucinates so much more. If you mess around with trained weights you're going to delete some base knowledge, as least the knowledge that is outside of the tasks you RL on. |
|