Hacker News new | ask | show | jobs
by Balinares 260 days ago
Wow, so not only are the findings from https://arxiv.org/abs/2506.21734 (posted on HN a while back) confirmed, they're generalizable? Intriguing. I wonder if this will pan out in practical use cases, it'd be transformative.

Also would possibly instantly void the value of trillions of pending AI datacenter capex, which would be funny. (Though possibly not for very long.)

5 comments

Any mention of "HRM" is incomplete without this analysis:

https://arcprize.org/blog/hrm-analysis

This here looks like a stripped down version of HRM - possibly drawing on the ablation studies from this very analysis.

Worth noting that HRMs aren't generally applicable in the same way normal transformer LLMs are. Or, at least, no one has found a way to apply them to the typical generative AI tasks yet.

I'm still reading the paper, but I expect this version to be similar - it uses the same tasks as HRMs as examples. Possibly quite good at spatial reasoning tasks (ARC-AGI and ARC-AGI-2 are both spatial reasoning benchmarks), but it would have to be integrated into a larger more generally capable architecture to go past that.

That's a good read also shared by another poster above, thanks! If I'm reading this right, it contextualizes, but doesn't negate the findings from that paper.

I've got a major aesthetic problem with the fact LLMs require this much training data to get where they are, namely, "not there yet"; it's brute force by any other name, and just plain kind of vulgar. Although more importantly it won't scale much further. Novel architectures will have to feature in at some point, and I'll gladly take any positive result in that direction.

Evolution is brute force by any other name. Nothing elegant about it. Nonetheless, here you are.

Poor sample efficiency of the current AIs is a well known issue - but you should keep in mind what kind of grisly process was required to give you the architecture that makes you as sample efficient as you are.

We don't know yet what kind of architectural quirks enable this sample efficiency in the human brain. It could be something like a non-random initialization process that confers the right inductive biases, a more efficient optimizer, recurrent background loops... or just more raw juice.

It might be that one biological neuron is worth 10000 LLM weights, and a big part of how the brain is so sample efficient is that it's hilariously overparametrized.

Brute force:

    for i in 1..99999999:
        if i == 66666654:
             print(i)
             break
GA:

    for g in 1..100:
        pop, best = crossover(tournament(pop, heuristic_fn))
        print(best.value)
        if best.fitness < 0.01:
            break

GA uses a heuristic to converge. If that is brute force, so is binary search.
> If that is brute force, so is binary search.

Binary search is guaranteed to find the target if it exists, so it's not a heuristic. GA isn't, as it can get stuck in local minima. However, I agree that GA isn't brute force.

Heuristic just means there is a function telling you where to go. For A* it is the goal, for binary search it is lte, for geadient descent it is adam.
Yeaaaaaah, I kinda doubt there's much coming from evolutionary biases.

If it's a matter of clever initialization bias, it's gotta be pretty simple to survive the replication via DNA and procedural generative process in the meat itself, alongside all of the other stuff which /doesn't/ differentiate us from chimpanzees. Likely simple enough that we would just find something similar ourselves through experimentation. There's also plenty of examples of people learning Interesting Unnatural Stuff using their existing hardware (eg, echolocation, haptic vision, ...) which suggests generality of learning mechanisms in the brain.

The brain implements some kind of fairly general learning algorithm, clearly. There's too little data in the DNA to wire up 90 billion neurons the way we can just paste 90 billion weights into a GPU over a fiber optic strand. But there's a lot of innate scaffolding that actually makes the brain learn the way it does. Things like bouba and kiki, instincts, all the innate little quirks and biases - they add up to something very important.

For example, we know from neuroscience that humans implement something not unlike curriculum learning - and a more elaborate version of it than what we use for LLMs now. See: sensitive periods. Or don't see sensitive periods - because if you were born blind, but somehow regained vision in adulthood, it'll never work quite right. You had an opportunity to learn to use the eyes well, and you missed it.

Also, I do think that "clever initialization" is unfortunately quite plausible. Unfortunately - because yes, it has to be simple enough to be implemented by something like a cellular automata, so the reason why we don't have it already is that the search space of all possible initializations a brain could implement is still extremely vast and we're extremely dumb. Plausible - because of papers like this one: https://arxiv.org/abs/2506.20057

If we can get an LLM to converge faster by "pre-pre-training" it on huge amounts of purely synthetic, algorithmically generated meaningless data? Then what are the limits of methods like that?

> Evolution is brute force by any other name.

No, it's not.

That analysis provided a very non-abrasive wording of their evaluation of HRM and its contributions. The comparison with a recursive / universal transformer on the same settings is telling.

"These results suggest that the performance on ARC-AGI is not an effect of the HRM architecture. While it does provide a small benefit, a replacement baseline transformer in the HRM training pipeline achieves comparable performance."

Also would possibly instantly void the value of trillions of pending AI datacenter capex

GPU compute is not just for text inferencing. The video generation demand is something I don’t think we’ll ever saturate for quite a while, even with breakthroughs.

It doesn't matter how much compute you have, you'll always be able to saturate it one way or another with ai and having more compute will forever be an advantage.

If breakthrough in ai happens you'll get multiplied benefits, not loss.

That does depend on GPUs being more efficient than CPUs for those breakthroughs.
For matrix multiplication that's probably true though.
That depends on how fine-grained the matrix multiplication is (and if that's actually the core workload). Past a certain scale you can't get away with brute force (and Moore's law doesn't fix that insofar as orders of magnitude of difference will exist no matter where you are in that progression, allowing for qualitatively different capabilities between better algorithms and worse) examining every parameter every time, which could push the world to something where the branchy nature of CPUs and their better amenability to true random-access memory makes them win out.
The “AI is hype” can’t seem to wrap this idea around their little heads for some reason.
>Also would possibly instantly void the value of trillions of pending AI datacenter capex

I think they would just adopt this idea and use it to continue training huge but more capable models.

Jevon’s paradox applies here IMHO. Cheaper AI/watt = more demand.
It would be fitting if the AI bubble was popped by AI getting too good and too efficient