|
Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing. The part I find most interesting is his proposal that neural networks largely work by “hitching a ride” on fundamental computational complexity, in practice sort of searching around the space of functions representable by an architecture for something that works. And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them. I think this is a pretty profound idea, and it sounds right to me — it seems like a rich theoretical area for next-gen information theory, essentially are their (soft/hard) bounds on certain kinds of explainability/inspectability? FWIW, there’s a reasonably long history of mathematicians constructing their own ontologies and concepts and then people taking like 50 or 100 years to unpack and understand them and figure out what they add. I think of Wolfram’s cellular automata like this, possibly really profound, time will tell, and unusual in that he has the wealth and platform and interest in boosting the idea while he’s alive. |
The ML research community generally agrees that the key to generalization is finding the shortest "program" that explains the data (Occam's Razor / MDL principle). But directly searching for these minimal programs (architecture space, feature space, training space etc) is exceptionally dificult, so we end up approximating the search to look something like GPR or circuit search guided by backprop.
This shortest program idea is related to Kolmogorov complexity (arises out of classical Information Theory) - i.e. the length of the most concise program that generates a given string (because if your not operating on the shortest program, then there is looseness/or overfit!). In ML, the training data is the string, and the learned model is the program. We want the most compact model that still captures the underlying patterns.
(D)NNs have been super successful, their reliance on approximations suggests there's plenty of room for improvement in terms of inductive bias and more program-like representations. I think approaches that combine the flexibility of neural nets with the structured nature of symbolic representations will lead to more efficient and performant learning systems. It seems like a rich area to just "try stuff" in.
Leslie Valiant touches on some of the same ideas in his book "Probably approximately correct" which tries to nail down some of the computational phenomena associated with the emergent properties of reality (its heady stuff).