|
|
|
|
|
by PontifexCipher
499 days ago
|
|
Some info that may be missing: - v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD) - r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_) - In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/ |
|