| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dr_dshiv 588 days ago
	> they’re merely regurgitating memorized information Source?

3 comments

llm_trw 588 days ago

If a model can't inately reason over 5 steps in a simple task but produces a flawless 500 step proof you either have divine intervention or memorisation.

link

NitpickLawyer 588 days ago

AlphaGeometry has entered the chat.

Also, AIMOv2 is doing stage 2 of their math challenge, they are now at "national olympics" level of difficulty. They have a new set of questions. Last year's winner (27/50 points) got 2/50 on the new set. In the first 3 weeks of the competition the top score is 10/50 on the new set, mostly with Qwen2.5-math. Given that this is a purposefully made new set of problems, and according to the organizers "made to be AI hard", I'd say the regurgitation stuff is getting pretty stale.

Also also, the fact that claude3.5 can start coding in an invented language w/ ~20-30k tokens of "documentation" about the invented language is also some kind of proof that the stochastic parrots are the dismissers in this case.

link

llm_trw 588 days ago

I've not tested those models. Feel free to flick me through a couple of k in bitcoins if you'd like me to have a look for you.

link

firebaze 588 days ago

I'm not sure if it is feasible to provide all relevant sources to someone who doesn't follow a field. It is quite common knowledge that LLMs in their current form have no ability to recurse directly over a prompt, which inherently limits their reasoning ability.

link

dr_dshiv 587 days ago

I am not looking for all sources. And I do follow the field. I just don’t know the sources that would back the claim they are making. Nor do I understand why limits on recursion means there is no reasoning and only memorization.

link

light_hue_1 588 days ago

This is just totally false.

That's exactly what countless techniques related to chain of thought do.

link

llm_trw 588 days ago

The closest explanation to how chain of through works is suppressing the probability of a termination token.

People have found that even letting llms generate gibberish tokens produces better final outputs. Which isn't a surprise when you realise that the only way a llm can do computation is by outputting tokens.

link

dr_dshiv 588 days ago

It’s sometimes like, are these critics using the tools? It’s a strange schism at the moment.

link

llm_trw 588 days ago

It's my job to build these tools. I'm well aware of their strengths and shortcomings.

link

dr_dshiv 587 days ago

Unless you are building one of the frontier models, I’m not sure that your experience gives you insight on those models. Perhaps it just creates needless assumptions.

link

llm_trw 587 days ago

I'm building the layer on top of the models.

People call it agentic AI but that's a word without a definition.

Needless to say better LLMs help with my work, the same way that a stronger horse makes plowing easier.

The difference is that unlike horse breeders, e.g. Anthropic and openAI, I want to get to the internal combustion engine and tractors.

link

exe34 588 days ago

he just explained it to you.

link