| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tuhgdetzhh 122 days ago
	The test is rigged because they used non thinking models.

2 comments

handoflixue 122 days ago

Testing some subset X does not mean the test is rigged unless they failed to disclose that.

But also:

GPT 5.2 Thinking, Standard Effort: Walk - https://chatgpt.com/share/699d38cb-e560-8012-8986-d27428de8a...

I'm assuming "GPT 5.2 Thinking" is, in fact, a thinking model?

link

randomtoast 122 days ago

The problem is you haven't used the API, but you have used your ChatGPT subscriptions with personality, memories and possible customization. I can see for instance that your ChatGPT answers with emojis, while my ChatGPT subscription never does.

If you ask GPT 5.2 with high reasoning efforts in the API, you get 10 out of 10: drive.

link

handoflixue 119 days ago

If it doesn't work at all using the most popular pricing plans (subscription), AND it doesn't work on the most popular way of accessing it (web), then it seems fair to say there's a problem.

And the problem is NOT that I'm using a product in the advertised, intended way.

link

felix089 122 days ago

These are reasoning / thinking models

link

tuhgdetzhh 122 days ago

Source?

link

tverbeure 122 days ago

I don't know, but model names such as "kimi-k2-thinking" in the test set might offset a clue.

link

etyhhgfff 122 days ago

Yes, there are some exceptions where it clearly states that a thinking model has been chosen like for kimi, but there is no such indicator for the GPT family from OpenAI and other major models.

link