Hacker News new | ask | show | jobs
by denysvitali 166 days ago
Better link: https://iquestlab.github.io/

But yes, sadly it looks like the agent cheated during the eval

2 comments

According to https://github.com/IQuestLab/IQuest-Coder-V1/issues/14#issue... the result is still good after fixing the cheating problem. 76.2% (from 81.4%) which still beats Opus 4.5 (74.4%)!!
Unfortunately they seem to have neglected to update their front page readme with this information, continuing to mislead people: https://github.com/IQuestLab/IQuest-Coder-V1
It is updated on their actual home page, though. There is clearly no intent to mislead people.

https://iquestlab.github.io

What do you mean? Opus 4.5 and GPT 5.2 broke the 80% mark and no other models yet seem to be passing this important milestone.
The link didn’t get enough votes a few days ago.
I know - I posted it :)