| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Lerc 51 days ago
	Has there been much exploration on how much benefit comes from precision in activation functions in KANs? There's a little niggle in the back of my head that maybe 90% of the benefit of KANs can be gained from a quite small variety of function shapes. Combined with input weighting, I almost feel you could have a representation that scales from a standard relu perceptron though KANs to something with weighted inputs and fancy weighted activation functions. Mark that out in 2d with axes of input weight precision and activation weight precision, you could perhaps do sweeps to find the best accuracy per parameter bit, or accuracy/speed, or some sweet spot that has a nice balance of operating speed, accuracy, and model size.

2 comments

ag2718 51 days ago

There is definitely a precision-performance tradeoff to consider. We explored this through ablation studies on bitwidth precision / resource usage in our work (Figure 6a in https://arxiv.org/pdf/2512.12850, Figure 4 in https://arxiv.org/pdf/2602.02056). Further exploration into the mechanics here would definitely be useful.

Regarding your point that "90% of the benefit of KANs can be gained from a small variety of function shapes": even within the B-spline basis, the shapes are quite uniform. Much of the actual benefit of scaling up the basis size comes from learning more complex, piecewise-polynomial activation functions. Scaling up the number of basis functions (i.e. more granular intervals) also increases locality and allows the activation function's value across different parts of the domain to be learned semi-independently. (There obviously is a tradeoff here with overfitting.)

The number of basis functions (G+S) is largely what determines how expressive the activation is, as it relates to your point: "you could have a representation that scales from a standard relu perceptron though KANs to something with weighted inputs and fancy weighted activation functions."

link

zipy124 51 days ago

Can I just say that this is extremely impressive work for a master's level thesis. Incredible work and I hope you manage to continue fulfilling your fantastic potential in your career!

link

ag2718 51 days ago

Thank you :)

link

hodgehog11 51 days ago

The benefit in KANs is interpretability, not expressivity. It's a structure that lends itself well to performing symbolic regression or other interpretable downstream tasks. This can make it better suited for scientific tasks, for example. You can easily replicate the practical performance of any KAN with an MLP, and it will train and run faster on modern architectures. This proposes a method it might be faster, but it's early days to me.

Precision in the activation function is targetting a part of neural networks that you don't want. There are many other methods that work with high precision. You use neural networks because of their implicit bias toward regular solutions. That means there is a sweet spot at low precision that you're targetting.

link

ag2718 51 days ago

A key benefit of KANs is expressivity, as each layer is significantly more expressive than an MLP layer. This can be seen in our benchmarks: KAN networks need fewer layers than MLPs to match or beat their performance, even in software.

However, on GPUs, KAN implementations are far less efficient than MLPs: since B-spline locality is hard to exploit and lookup operations aren't as efficient. This is your original point about MLPs training and running faster on modern architectures: each KAN layer is more expressive, but its poor hardware efficiency makes it a net negative (at least for current approaches).

On FPGAs, LUT lookups are cheap, so KANs' expressive layers map to very hardware-efficient implementations, and the resulting networks are thus much more compact and efficient than equivalent MLPs.

On your second point: low precision is certainly viable for both inference and learning (as shown in our work), and quantization can even have a mild regularizing effect. However, task performance generally worsens with lower precision (here and across the literature): the use of low precision is fundamentally a result of the efficiency-performance tradeoff.

link

hodgehog11 51 days ago

I generally agree with this rebuttal. Each KAN layer is more expressive on a per-layer basis, although there is a mapping to an MLP with more layers. With the current hardware implementations, yes, MLPs have an advantage overall. I can certainly respect the intention to make KANs faster, since it is a serious issue for more widespread adoption, and KANs certainly have their value.

I'm still very skeptical of arguing for KANs as an eventual replacement, like I've seen some papers on the subject argue. The reduced depth may not be an advantage. For example, higher depth for standard neural networks doesn't just add to expressivity, it actually induces spectral sparsity bias. KANs have a bias of their own, but it is different, and is sometimes better, sometimes worse, depending on the task. If increasing depth turns out to be important, KANs might remain less efficient overall.

link

ag2718 51 days ago

Ah I see, that's an interesting point about higher depth potentially having other benefits. For our work on smaller models (e.g. generally <5 layers), this might not have been as relevant but I would definitely be interested to see implications for much deeper networks. As to your point about KANs performing better or worse depending on the specific task, we definitely did notice this to some extent (symbolic tasks were the best, non-symbolic tasks such as image recognition were the worst).

link

Lerc 50 days ago

>symbolic tasks were the best, non-symbolic tasks such as image recognition were the worst

I wonder how much of that is not so much the overall task but the need to build up to a complex state where KANs can excel. If you consider the classic neuralnet edge detector example, it's hard to imagine a KAN doing the task more efficiently, it seems like a necessary task as part of the overall process but delegating a more capable system to a menial task is probably wasting resources.

One layer of conv2d might be enough to turn pixels into something that KANs manage better.

link

ag2718 50 days ago

This is definitely true: one could imagine a model with a mix of the two layers or a simple linear / MLP-like kernel doing "preprocessing" before KAN layers. Other work that explores task performances for KANs and MLPs generally finds KANs are worse at non-symbolic tasks, but it would be interesting to see if hybrid architectures could improve on this failure mode.

link