Hacker News new | ask | show | jobs
by apl 2479 days ago
Their applications are tabular data (for which MLPs have never been the method of choice) and MNIST (which I could classify at 85% with a rusty nail), so it's not super impressive.

NNs and the associated toolkit shine with structured high-dimensional data where CNNs, RNNs, or modern shenanigans like Transformer networks excel. I sincerely doubt that these networks turn out to be reducible to polynomial regression in any practically useful sense of the notion. But who knows.

2 comments

Terminology note: data like images and voice which have strong spatial or temporal patterns are actually referred to as "unstructured" data; while data you get from running "SELECT * FROM some_table" or the carefully designed variable of a clinical trial are referred to as "structured" data.

If this seems backwards to you (as it did to me at first) note that unstructured data can be captured raw from instruments like cameras and microphones, while structured data usually involved a programmer coding exactly what ends up in each variable.

As you say, deep neural networks based on CNNs are SOTA on unstructured image data, RNNs are SOTA on unstructured voice and text data, while tree models like random forest and boosted trees usually SOTA on problems involving structured data. The reason seems to be the that the inductive biases inherent to CNNs and RNNs, such as translation invariance, are a good fit for the natural structure of such data, while the the strong ability of trees to find rules is well suited to data where every variable is cleanly and unambiguously coded.

Yeah, that's right. Doing too little proofreading with HN comments...
NNs can approximate any computable function. Any function can also be approximated by a polynomial. This is all proven math and has been known for a long time. I think of these as different facets of the same thing and each has different tools for computing, analyzing and understanding.
A hash table can also approximate any function, so the "universal function approximation" thing is a bit oversold. It isn't really what matters. What matters is how well methods generalize beyond the training data, and how much data they need to do this well.