Hacker News new | ask | show | jobs
by hodgehog11 1 day ago
I generally agree with this rebuttal. Each KAN layer is more expressive on a per-layer basis, although there is a mapping to an MLP with more layers. With the current hardware implementations, yes, MLPs have an advantage overall. I can certainly respect the intention to make KANs faster, since it is a serious issue for more widespread adoption, and KANs certainly have their value.

I'm still very skeptical of arguing for KANs as an eventual replacement, like I've seen some papers on the subject argue. The reduced depth may not be an advantage. For example, higher depth for standard neural networks doesn't just add to expressivity, it actually induces spectral sparsity bias. KANs have a bias of their own, but it is different, and is sometimes better, sometimes worse, depending on the task. If increasing depth turns out to be important, KANs might remain less efficient overall.

1 comments

Ah I see, that's an interesting point about higher depth potentially having other benefits. For our work on smaller models (e.g. generally <5 layers), this might not have been as relevant but I would definitely be interested to see implications for much deeper networks. As to your point about KANs performing better or worse depending on the specific task, we definitely did notice this to some extent (symbolic tasks were the best, non-symbolic tasks such as image recognition were the worst).
>symbolic tasks were the best, non-symbolic tasks such as image recognition were the worst

I wonder how much of that is not so much the overall task but the need to build up to a complex state where KANs can excel. If you consider the classic neuralnet edge detector example, it's hard to imagine a KAN doing the task more efficiently, it seems like a necessary task as part of the overall process but delegating a more capable system to a menial task is probably wasting resources.

One layer of conv2d might be enough to turn pixels into something that KANs manage better.

This is definitely true: one could imagine a model with a mix of the two layers or a simple linear / MLP-like kernel doing "preprocessing" before KAN layers. Other work that explores task performances for KANs and MLPs generally finds KANs are worse at non-symbolic tasks, but it would be interesting to see if hybrid architectures could improve on this failure mode.