|
|
|
|
|
by ikurei
11 days ago
|
|
Qwen 3.7 Max:
> During my local testing before the full eval harness it was the only non-GPT model that was able to complete the task, was not able to reproduce in the longer runs. Doesn't that sound like may be the harness was the problem? |
|