Hacker News new | ask | show | jobs
by tuhgdetzhh 122 days ago
The test is rigged because they used non thinking models.
2 comments

Testing some subset X does not mean the test is rigged unless they failed to disclose that.

But also:

GPT 5.2 Thinking, Standard Effort: Walk - https://chatgpt.com/share/699d38cb-e560-8012-8986-d27428de8a...

I'm assuming "GPT 5.2 Thinking" is, in fact, a thinking model?

The problem is you haven't used the API, but you have used your ChatGPT subscriptions with personality, memories and possible customization. I can see for instance that your ChatGPT answers with emojis, while my ChatGPT subscription never does.

If you ask GPT 5.2 with high reasoning efforts in the API, you get 10 out of 10: drive.

If it doesn't work at all using the most popular pricing plans (subscription), AND it doesn't work on the most popular way of accessing it (web), then it seems fair to say there's a problem.

And the problem is NOT that I'm using a product in the advertised, intended way.

These are reasoning / thinking models
Source?
I don't know, but model names such as "kimi-k2-thinking" in the test set might offset a clue.
Yes, there are some exceptions where it clearly states that a thinking model has been chosen like for kimi, but there is no such indicator for the GPT family from OpenAI and other major models.