|
|
|
|
|
by srush
547 days ago
|
|
That's a good point. We don't see that in our experiments because it's all in the math domain. However for OAI it's plausible that training for o1 might conflict with standard instruction training, leading to less human preferred output style. |
|