|
|
|
|
|
by dimitry12
539 days ago
|
|
I believe this is a valid point: HF's replication indeed uses larger off-the-shelf model as a verifier. In contrast, in the original paper, verifier is a fine-tune of the exact same base model which is used to sample step-by-step solutions (="solver"). |
|
Using 3B model with 8B verifier against 70B model would make sense too. This being said their performance barely crossed 70B line with 256 examples. This is 256*(8+3)/70 ~ 40 times more computationally expensive than running 70B model as is.