| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JohnKemeny 275 days ago
	There is a clear difference between what OpenAI manages to do with GPT-5 and what I manage to do with GPT-5. The other day I asked for code to generate a linear regression and it gave back a figure of some points and a line through it. If GPT-5, as claimed, is able to solve all problems in ICPC, please give the instructions on how I can reproduce it.

4 comments

theptip 275 days ago

I believe this is going to be an increasingly important factor.

Call it the “shoelace fallacy”: Alice is supposedly much smarter but Bob can tie his shoelaces just as well.

The choice of eval, prompt scaffolding, etc. all dramatically impact the intelligence that these models exhibit. If you need a PhD to coax PhD performance from these systems, you can see why the non-expert reaction is “LLMs are dumb” / progress has stalled.

link

paxys 275 days ago

Yeah, until OpenAI says "we pasted the questions from ICPC into chatgpt.com and it scored 12/12" the average user isn't really going to be able to reproduce their results.

link

anthonypasq 275 days ago

the average person doesnt need to do that. The benchmark for "is this response accurate and personable enough" on any basic chat app has been saturated for at least a year at this point.

link

SamPatt 274 days ago

The average user will never need to answer ICPC questions though.

link

Jensson 274 days ago

No, but the average users have things they want to do that require ICPC level problem solutions. Like making optimized games etc, average users wants that for sure.

link

simianwords 275 days ago

Are you using the thinking model or the non thinking model? Maybe you can share your chat.

link

JohnKemeny 275 days ago

I prefer not to due to privacy concerns. Perhaps you can try yourself?

I will say that after checking, I see that the model is set to "Auto", and as mentioned, used almost 8 minutes. The prompt I used was:

    Solve the following problem from a competitive programming contest. Output only the exact code needed to get it to pass on the submission server.

It did a lot of thinking, including

   I need to tackle a problem where no web-based help is available. The task involves checking if a given tree can be the result of inserting numbers 1 to n into an empty skew heap, following the described insertion algorithm. I have to figure out the minimal and maximal permutations that produce such a tree.

And I can see that it visited 13 webpages, including icpc, codeforces, geeksforgeeks, github, tehrantimes, arxiv, facebook, stackoverflow, etc.

link

jsnell 275 days ago

A terse prompt and expecting a one-shot answer is really not how you'd get an LLM to solve complex problems.

I don't know what Deepmind and OpenAI did in this case, but to get an idea of the kind of scaffolding and prompting strategy that one might want, have a look at this paper where some floks used the normal generally available Gemini Pro 2.5 to solve 5/6 of the 2025 IMO problems: https://arxiv.org/pdf/2507.15855

link

minimaxir 275 days ago

The point of the GPT-5 model is that it is supposed to route between thinking/nonthinking smartly. Leveraging prompt hacks such as instructing it to "think carefully" to force routing to the thinking model go against OpenAI's claims.

link

Workaccount2 275 days ago

Just select GPT5-thinking if you need anything done with competence. The regular gpt5 is nothing impressive and geared more towards regular daily life chatting.

link

koakuma-chan 275 days ago

Are you sure? I thought you can only specify reasoning_effort and that's it.

link

levocardia 275 days ago

If you can't get a modern LLM to generate a simple linear regression I think what you have is a problem between the keyboard and the chair...

link