Hacker News new | ask | show | jobs
by freshtake 293 days ago
An interesting debate!

A few things to consider:

1. This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.

2. The author noted that humans had updated their own research in April 2025 with an improved solution. For cases where we detect signs of superior behavior, we need to start publishing the thought process (reasoning steps, inference cycles, tools used, etc.). Otherwise it's impossible to know whether this used a specialty model, had access to the more recent paper, or in other ways got lucky. Without detailed proof it's becoming harder to separate legitimate findings from marketing posts (not suggesting this specific case was a pure marketing post)

3. Points 1 and 2 would help with reproducibility, which is important for scientific rigor. If we give Claude the same tools and inputs, will it perform just as well? This would help the community understand if GPT-5 is novel, or if the novelty is in how the user is prompting it

4 comments

I don't mean to be cynical, but I don't think these points matter as much as you think, at least not in practice. The hardest part of a proof is working out the intermediate steps; joining them up is often trivial, even for a student. So even if it works out a few good steps or finds an effective theorem to apply, and does so only every one in a hundred prompts, the time savings can be significant.

I should know, I've been using LLM thinking models to help brainstorm ideas for stickier proofs. It's been more successful at discovering esoteric entry points than I would like to admit.

> This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.

If you could combine this with automated theorem proving, it wouldn't matter if it was right only 1 out of a 1000 times.

The most difficult part of automated theorem proving is not the "tactic" part, but actually in the formulation.

(Theory building is quite hard in math; the computation side is only hard after a point).

Perhaps 1/1000 would be a useful rate, but numbers go a lot smaller than 1/1000.
> This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent?

High chance given that this is the same guy that came up with SVG unicorn (sparks of AGI) which raises the same question even more obviously.

4. How many times has this happened already but the human took credit for the output because they don't have the incentive to give credit to the LLM
I'd say a lot of people even have an incentive to not give credit to the LLMs, because there is a social stigma associated with using AI, due to its association with low-quality work.
I'm guessing the music business right now is absolutely awash with unreported and uncredited AI lyrics and backing tracks. It's an area where you can get away with it a lot easier than in the visual arts.
People are delusional. There’s a large cohort of folks on HN who still think AI is just a stochastic parrot. Depending on the topic or the thread you’ll find more of those people and get voted down if you even imply that LLMs can reason.
I think many humans are stochastic parrots, so in a sense those parrots are damn right.
why "many" why not "all"?
Obviously I'm not a stochastic parrot :P
My claim is the LLM is not a stochastic parrot. So technically they're wrong. But of course I understand your point.
They are to some extent though. The bigger point is that they are not just a stochastic parrot. But examples like the modified riddles where they just answer the original riddle shows that they have the behaviour of stochastic parrots at least some of the time.
I don’t think it’s that they don’t have the incentive. I think it’s because it’s unclear if you give credit to the LLM if that means that OpenAI or similar would be considered an author in which case that could really screw up intellectual property and make using LLMs much less attractive. If the LLM wants attribution then it’s sentient, and if it’s sentient, it may be given personhood (Johnny-five scenario) and get rights, and then it would be a writer, and it could influence the license and intellectual property may belong partially to it unless it willingly became and employee of a ton of companies and organizations or contracted with them.