Hacker News new | ask | show | jobs
by godelski 291 days ago

  > this isn't really about whether scaling is "dead" 
I think there's a good position paper by Sara Hooker[0] that mentions some of this. Key point being that while the frontier is being pushed by big models with big data there's a very quiet revolution of models using far fewer parameters (still quite big) and data. Maybe "Scale Is All You Need"[1], but that doesn't mean it is practical or even a good approach. It's a shame these research paths have gotten a lot of pushback, especially given today's concerns about inference costs (this pushback still doesn't seem to be decreasing)

  > verifiable rewards
There's also a current conversation in the community over world models: is it actually a world model if the model does not recover /a physics/[2]. The argument for why they should recover a physics is that this means a counterfactual model must have been learned (no guarantees on if it is computationally irreducible). A counterfactual model gives far greater opportunities for robust generalization. In fact, you could even argue that the study of physics is the study of compression. In a sense, physics is the study of the computability of our universe[3]. Physics is counterfactual, allowing you to answer counterfactual questions like "What would the force have been if the mass had been 10x greater?" If this were not counterfactual we'd require different algorithms for different cases.

I'm in the recovery camp. Honestly I haven't heard a strong argument against it. Mostly "we just care that things work" which, frankly, isn't that the primary concern of all of us? I'm all for throwing shit at a wall and seeing what sticks, it can be a really efficient method sometimes (especially in early exploratory phases), but I doubt it is the most efficient way forward.

In my experience, having been a person who's created models that require magnitudes fewer resources for equivalent performance, I cannot stress enough the importance of quality over quantity. The tricky part is defining that quality.

[0] https://arxiv.org/abs/2407.05694

[1] Personally, I'm unconvinced. Despite success of our LLMs it's difficult to decouple other variables.

[2] The "a" is important here. There's not one physics per-say. There are different models. This is a level of metaphysics most people will not encounter and has many subtleties.

[3] I must stress that there's a huge difference between the universe being computable and the universe being a computation. The universe being computable does not mean we all live in a simulation.