|
|
|
|
|
by phh
383 days ago
|
|
Cool cool. I'm a bit put off by calling it "reasoning" /"thought". These RL targets can be achieved without "thinking" model but still cool. Gotta love the brainfuck task. I personally think that Gemini 2.5 Pro's superiority comes from having hundreds or thousands RL tasks (without any proof whatsoever, so rather a feeling). So I've been wanting a "RL Zoo" for quite a while. I hope this project won't be a one-off and will be maintained long term with many external contributions to add new targets! |
|