|
|
|
|
|
by energy123
193 days ago
|
|
I am not sure how to interpret the first paper's results. If we use a random number generator then we will converge to 100% correct answers under pass@n in the limit. A random number generator will eventually outperform or match all models (for large n) whenever top-p is less than 1 because the other models will most likely have some level of bias that makes correct CoTs mathematically impossible due to the tokens being too improbable and being filtered out by top-p, meaning that other models will asymptote to below 100% while the RNG will reach 100% in an almost surely sense. Under this paper's logic doesn't that mean that the random number generator is a superior reasoner? |
|
On the unstructured outputs, where you can’t just ratchet up the pass@k until it’s almost random, it switches the base model out for instruct, and in the worse case on livecodebench it uses a qwen-r1-distill as a _base_ model(!?) that’s an instruct model further fine tuned on R1’s reasoning traces. I assume that was because no matter how high the pass@k, a base model won’t output correct python.