|
|
|
|
|
by gitaarik
620 days ago
|
|
Isn't it because this test has since been spread on the internet and the LLM's picked up on that so now they give the correct answer? Maybe try a new unique logical question. And not the same question with a few words changed, because that might still match close to data the LLM already scanned. |
|
I tested the models 4 days after the paper was published.
The models are retrained every few months, and the process takes much more than 4 days.