|
|
|
|
|
by drzoltar
1689 days ago
|
|
It’s frustrating how myopic these papers can be. It seems like the goal of the paper is to solely work within the GPT framework to test the theory of verifiers. Why not try verifiers out with other models? Perhaps it’s not a fair comparison but I remember a Kaggle competition [0] from six years ago which involved building models to solve grade school science multiple choice questions. A simple word2vec model already could achieve 50% accuracy. Despite multiple choice being (maybe?) easier than free response, I’m just skeptical that the way to solve these problems is to throw billions of weights at them. It’s also not convincing to me that this new dataset doesn’t suffer from a much smaller template space, in that the models still just memorize templates. [0]: https://www.kaggle.com/c/the-allen-ai-science-challenge/over... |
|