| The paper [0] is pretty good for handling questions like those. > doesn't KA representation require continuous univariate functions? All multivariate continuous functions (on a bounded domain) can be represented as compositions of addition and univariate continuous functions. Much like an MLP, you can also approximate discontinuous functions well on most of the domain (learning a nearby continuous function instead). > do B-splines actually cover the space of all continuous functions Much like an MLP, you can hit your favorite accuracy bound with more control points. > wouldn't... MLPs be better for the learnable activation functions Perhaps. A B-spline is comparatively very fast to compute. Also, local training examples have global impacts on an MLP's weights. That's good and bad. One property you would expect while training a KAN in limited data regimes is that some control points are never updated, leading to poor generalization due to something like a phase shift as you cross over control points (I think the entropy-based regularizer they have in the paper probably solves that, but YMMV). The positive side of that coin is that you neatly side-step catastrophic forgetting. [0] https://arxiv.org/abs/2404.19756 |
It’s weird to just ignore MLPs when approximating a continuous univariate function. But if the paper did use MLPs theyd have ended up with something that looks a lot more like conventional neural networks, so maybe thats why?