Hacker News new | ask | show | jobs
by r0ze-at-hn 62 days ago
We’re in a strange era where the Information-Theoretic foundations of deep learning are solidifying. The 'Why' is largely solved: it’s the efficient minimization of irreversible information loss relative to the noise floor. There is so much waste scaling models bigger and bigger when the math points to how to do it much more efficiently. One can take a great 70B model and have it run in only ~16GB with no loss in capability and the ability to keep training, but the last few years funding only went for "bigger".

As you noted, the industry has moved the goalposts to Agency and Long-horizon Persistence. The transition from building 'calculators that predict' to 'systems that endure' is a non-equilibrium thermodynamics problem. There is math/formulas and basic laws at play here that apply to AI just as much as it applies to other systems. Ironically it is the same math. The same thing that results in a signal persisting in a model will result in agents persisting.

This is my specific niche. I study how things persist. It’s honestly a bit painful watching the AI field struggle to re-learn first principles that other disciplines have already learned. I have a doc I use to help teach folks how the math works and how to apply it to their domain and it is fun giving it folks who then stop guessing and know exactly how to improve the persistence of what they are working on. Like the idea of "How many hours we can have a model work" is so cute compared to the right questions.

3 comments

Can you share that document?
Crackpot "Universal Theory of Everything" physics rooted in numerology:

https://meta-r0ze.github.io/Informational-Energetics/Informa...

also interested
> It’s honestly a bit painful watching the AI field struggle to re-learn first principles that other disciplines have already learned.

This is my fear with software development in general. There's a hundred-year old point of view right next door that'll solve problems and I'm too incurious to see it.

I have a relative with a focus in math education that I've been stealing ideas from, and I think we'd both appreciate a look at your doc if you don't mind.

I think some of it has to do with incentives. Nobody wants to invest in a team to adapt and test other-field lessons that may come out as "there's no free lunch" or "this is equivalent to a hard problem they didn't solve there yet either."

So instead we're more likely to see navel-gazing "singularity" stories that fit with telling your investors they will become fantastically rich.

> One can take a great 70B model and have it run in only ~16GB with no loss in capability and the ability to keep training, but the last few years funding only went for "bigger".

Awesome. What is holding you back? What do you need the funding for?

Presumably $100m to train the 70B model? I think you're assuming that the author meant you can take an existing 70B model and run it in 16GB. But it stands to reason that "no loss in capability" means it had to be trained under those constraints.
When an AI says things like that we call it “hallucination”.