|
|
|
|
|
by ozankabak
2056 days ago
|
|
You are talking about Sprecher's modification to the original Kolmogorov-Arnold theorem, right? This version, and its implications, have been a lingering wondering for me for quite a while. Are you aware of any research on 3-layer networks where the unknown transfer function is also learnable? I suspect such an approach does not result in good models (otherwise we would have known about them!), but I can not articulate why. Where exactly does the K-A reasoning fail when we try to apply it in practice? |
|
I was unaware, but apparently Gribel gave a constructive proof in 2009 (link from Wikipedia article about KA rep theorem). I would have to read it and hope I am not too rusty to understand it before I could really ponder your question...
But I could offer two places I would have looked:
1. The approximation is of a continuous function, and such approximations (e.g. chebychev, bernstein) usually require that you be able to sample the function at specific points - but learning usually gives you training data that does not correspond to those specific points. It's possible that construction fails here somehow.
2. The approximation is too hard in practice. This is the too often the case for Breiman's beautiful ACE (Alternating Conditional Expectation) which, if you squint hard enough, looks like a two-layer network where each neuron has its own transfer function. The algorithm is incredibly simple in theory, but very hard to use in practice.