Hacker News new | ask | show | jobs
by kbr 596 days ago
NNs have complex non-convex loss functions that don't admit a closed-form solution. Even for small models, it can be shown that it's an NP-complete problem. In fact, even for linear regression (least squares), which has a closed-form solution, it can be computationally cheaper to run gradient descent since finding the closed form solution requires you to calculate and invert a large matrix (X^T X).
2 comments

Which in some sense is intuitive: any closed form that can model general computation to any significant degree should be hard: if it weren't, you could encode your NP-complete problem into it, solve it in an efficient closed form, and collect your Fields medal for proving P = NP.
Intuition is often wrong, even for high IQ people, like your average HN user. lol.

For a long time it was intuitive that you cannot find the area under arbitrary functions, but then Calculus was invented, showing us a new "trick", that was previously unfathomable, and indistinguishable from magic.

I'm just not sure mankind's understanding of Mathematics is out of new "tricks" to be learned. I think there are types of algorithms today that look like the require N-iterations to get X-precision, when in reality we might be able to divide N by some factor, for some algorithms, and still end up with X-precision.

> I'm just not sure mankind's understanding of Mathematics is out of new "tricks" to be learned.

This is my opinion also as it relates to AI/ANN. Things I read about how scientists see the brain shifting due to learning (minimum energy of network type stuff) seem like the brain has some functions figured out that we haven't identified yet.

Maybe it's math already fully understood just not applied well to ANN's, but maybe there's some secret sauce in there.

One reason to believe there's even new low hanging fruit (that doesn't even require new math) is how simple and trivial the "Attention Heads" structure of the Transformer architecture really is. It's not advanced at all. It was just a great ideal that panned out that pretty much any creative AI researcher could've thought up after smokin' a joint. lol. I mean someone could do trivial experiments with different Perceptron network structuring and end up revolutionizing the world.

I think things are gonna get interesting real quick once LLMs themselves start "self experimenting" with writing code for different architectures.

Thanks for that great clarification. I had seen all those words before, but just not in that particular order. haha.

Maybe our only hope of doing LLM training runs in a tiny amount of time will be from Quantum Computing or even Photonic (wave-based) Computing.