Hacker News new | ask | show | jobs
by vessenes 661 days ago
Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing.

The part I find most interesting is his proposal that neural networks largely work by “hitching a ride” on fundamental computational complexity, in practice sort of searching around the space of functions representable by an architecture for something that works. And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.

I think this is a pretty profound idea, and it sounds right to me — it seems like a rich theoretical area for next-gen information theory, essentially are their (soft/hard) bounds on certain kinds of explainability/inspectability?

FWIW, there’s a reasonably long history of mathematicians constructing their own ontologies and concepts and then people taking like 50 or 100 years to unpack and understand them and figure out what they add. I think of Wolfram’s cellular automata like this, possibly really profound, time will tell, and unusual in that he has the wealth and platform and interest in boosting the idea while he’s alive.

6 comments

Agree. (D)NNs have a powerful but somewhat loose inductive bias. They're great at capturing surface-level complexity but often miss the deeper compositional structure. This looseness, in my opinion, stems from a combination of factors: architectures that are not optimally designed for the specific task at hand, limitations in computational resources that prevent us from exploring more complex and expressive models, and training processes that don't fully exploit the available information or fail to impose the right constraints on the fitting process.

The ML research community generally agrees that the key to generalization is finding the shortest "program" that explains the data (Occam's Razor / MDL principle). But directly searching for these minimal programs (architecture space, feature space, training space etc) is exceptionally dificult, so we end up approximating the search to look something like GPR or circuit search guided by backprop.

This shortest program idea is related to Kolmogorov complexity (arises out of classical Information Theory) - i.e. the length of the most concise program that generates a given string (because if your not operating on the shortest program, then there is looseness/or overfit!). In ML, the training data is the string, and the learned model is the program. We want the most compact model that still captures the underlying patterns.

(D)NNs have been super successful, their reliance on approximations suggests there's plenty of room for improvement in terms of inductive bias and more program-like representations. I think approaches that combine the flexibility of neural nets with the structured nature of symbolic representations will lead to more efficient and performant learning systems. It seems like a rich area to just "try stuff" in.

Leslie Valiant touches on some of the same ideas in his book "Probably approximately correct" which tries to nail down some of the computational phenomena associated with the emergent properties of reality (its heady stuff).

What is GPR?
Gaussian Process Regression (a form of Bayesian Optimisation to try and get to the right "answer"/parameter space sooner) - explained in some context here...

https://brendanhasz.github.io/2019/03/28/hyperparameter-opti...

In saying that random param search still works well enough in many cases.

> neural networks largely work by “hitching a ride” on fundamental computational complexity

If you look at what a biological neural network is actually trying to optimize for, you might be able to answer The Bitter Lesson more adeptly.

Latency is a caveat, not a feature. Simulating a biologically-plausible amount of real-time delay is almost certainly wasteful.

Leaky charge carriers are another caveat. In a computer simulation, you can never leak any charge (i.e. information) if you so desire. This would presumably make the simulation more efficient.

Inhibitory neurology exists to preserve stability of the network within the constraints of biology. In a simulation, resources are still constrained but you could use heuristics outside biology to eliminate the fundamental need for this extra complexity. For example, halting the network after a limit of spiking activity is met.

Learning rules like STDP may exist because population members learned experiences cannot survive across generations. If you have the ability to copy the exact learned experiences from prior generations into new generations (i.e. cloning the candidates in memory), this learning rule may represent a confusing distraction more than a benefit.

"Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing."

Wolfram has a hammer and sees everything as a nail. But its a really interesting hammer.

> And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.

Could you define explainability in this context?

The ability of humans, by inspection, to determine why a program is constructed in the way that it is vis-a-vis the goal/output.
hmm -- could you speak to the meaning of explanation here?

It seems youre talking about the ability of a researcher to diagnose the structure of the model?

> searching around the space of functions representable by an architecture for something that works

That’s… why we’re here?

> searching around the space of functions representable by an architecture for something that works

That’s… why we’re here? How else could we characterise what any learning algorithm does?