|
|
|
|
|
by bluecoconut
641 days ago
|
|
This power law behavior of test-time improvement seems to be pretty ubiquitous now. In more agents is all you need [1], they start to see this as a function of ensemble size. It also shows up in: Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [2] I sorta wish everyone would plot their y-axis with logit y-axis, rather than 0->100 accuracy (including the openai post), to help show the power-law behavior. This is especially important when talking about incremental gains in the ~90->95, 95->99%. When the values (like the open ai post) are between 20->80, logit and linear look pretty similar, so you can "see" the inference power-law [1] https://arxiv.org/abs/2402.05120
[2] https://arxiv.org/abs/2407.21787 |
|