|
|
|
|
|
by mluo
499 days ago
|
|
We beat O1-preview and even many other 7B models over many math benchmarks, which was TEST set (not in training set at all). If you want to make the model fully generalist, feel free to train it over coding datasets (such as RL with passing unit tests as reward). |
|
Side question, since it sounds like you were involved: how big is the impact on benchmarks of taking this 1.5B model down from fp32 to fp8 or similar? The focus on parameters alone sometimes feels like comparing house sizes by their lengths alone. And, if you were indeed involved, thanks for making all of this open and available!