|
|
|
|
|
by mountainriver
252 days ago
|
|
A fair amount of research has shown that RL doesn’t add knowledge to the base model it just optimizes paths that already exist.
Now ProRL from Nvidia showed there are ways of adding knowledge, mostly through progressive merging. I’m still not fully convinced of the 1bit claim, they made other mistakes in the blog post |
|