Hacker News new | ask | show | jobs
by sandpaper26 1000 days ago
It's not a likely solution given how loss functions work, but in theory a single model could learn to perform exactly the function you describe. When you say "just do X" where X is any function (in this case, a piecewise function), a large enough model could do it.
2 comments

After some reflection, it's maybe more accurate to visualize this in reverse: all expert models see the problem and attempt a solution, and then some "manager" model decides which expert model has the best solution and outputs it.
Until the manager model decides to outsource the expert models.
In theory you need two layers to model any function. In practice this is wildly different.
Any memoryless continuous function between two Euclidean spaces, I think you mean. The experts-and-manager model would need to be able to do more than that (as do most neural networks).

And part of the reason why single-hidden-layer networks aren't enough even in continuous memoryless Euclidean cases is, again, because of how loss functions work; you're unlikely to converge on a good approximation with very few hidden layers.