Hacker News new | ask | show | jobs
by brrrrrm 1090 days ago
I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.
3 comments

The only ML that I ever did was a single undergrad NN class around ~2001. That was a long time ago, but I vaguely remember being taught at that time that adding more nodes rarely helped, that you were just going to overfit to your dataset and have worse results on items outside the dataset, or worse end up with a completely degenerate NN - eg that best practice was to use the minimum number of nodes that would do the job.
The modern slow-but-scales way of coding them also wasn't prevalent
Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?
On the contrary, there was a mathematical proof that one-hidden-layer neural network with nonlinearity is enough to represent any function. Using more than 1 hidden seemed a waste.

https://en.wikipedia.org/wiki/Universal_approximation_theore...

How??