| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by denysvitali 166 days ago
	Better link: https://iquestlab.github.io/ But yes, sadly it looks like the agent cheated during the eval

2 comments

According to https://github.com/IQuestLab/IQuest-Coder-V1/issues/14#issue... the result is still good after fixing the cheating problem. 76.2% (from 81.4%) which still beats Opus 4.5 (74.4%)!!

Unfortunately they seem to have neglected to update their front page readme with this information, continuing to mislead people: https://github.com/IQuestLab/IQuest-Coder-V1

It is updated on their actual home page, though. There is clearly no intent to mislead people.

What do you mean? Opus 4.5 and GPT 5.2 broke the 80% mark and no other models yet seem to be passing this important milestone.

The link didn’t get enough votes a few days ago.

I know - I posted it :)