|
|
|
|
|
by lossolo
305 days ago
|
|
Without a provable hold out, claim that "large models do fine on unseen patterns" is unfalsifiable. In controlled from scratch training, CoT performance collapses under modest distribution shift, even with plausible chains. If you have results where the transformation family is provably excluded from training and a large model still shows robust CoT, please share them. Otherwise this paper’s claim stands for the regime it tests. |
|
What would be your argument against
1. COT models performing way better in benchmarks than normal models
2. people choose to use the COT models in day to day life because they actually find that it gives better performance