Hacker News new | ask | show | jobs
by thesz 642 days ago
KAN's have O(N^(-4)) scaling law where N is the number of parameters. MLPs have O(N^(-1)) scaling or worse.

For where you need MLP with a tens of billions of parameters you may need KAN with thousands.