Hacker News new | ask | show | jobs
by SpaceManNabs 782 days ago
How does back propagation work now? Do these suffer from vanishing or exploding gradients?
2 comments

At page 6 it explains how they did back propagation https://arxiv.org/pdf/2404.19756 (and in page 2 it says that previous efforts to leverage Kolmogorov-Arnold representation failed to use backpropagation), so maybe using backpropagation to train multilayer networks with this architecture is their main contribution?

> Unsurprisingly, the possibility of using Kolmogorov-Arnold representation theorem to build neuralnetworks has been studied [8, 9, 10, 11, 12, 13]. However, most work has stuck with the original depth-2 width-(2n + 1) representation, and did not have the chance to leverage more modern techniques (e.g., back propagation) to train the networks. Our contribution lies in generalizing the original Kolmogorov-Arnold representation to arbitrary widths and depths, revitalizing and contextualizing it in today’s deep learning world, as well as using extensive empirical experiments to highlight its potential role as a foundation model for AI + Science due to its accuracy and interpretability.

No, the activations are a combination of the basis function and the spline function. It's a little unclear to me still how the grid works, but it seems like this shouldn't suffer anymore than a generic relu MLP.