|
|
|
|
|
by imiric
51 days ago
|
|
> So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one). No, it's not. While I don't dispute that new models may perform better at certain tasks, the fact that someone was able to use them to solve a novel problem is not proof of this. LLM output is nondeterministic. Given the same prompt, the same LLM will generate different output, especially when it involves a large number of output tokens, as in this case. One of those attempts might produce a correct output, but this is not certain, and is difficult if not impossible for a human not expert in the domain to determine this, as shown in this thread. |
|