Hacker News new | ask | show | jobs
by nbraem 858 days ago
This is because currently deep learning is not a science, but engineering. There is no underlying theory why deep neural networks generalize as well as they do. Classical learning theory (VC) actually states that large models with millions of parameters should not work.

There are some academics working on this, but it pales in comparison with how much money is being poured into generative AI.

So today's state-of-the-art models are trained with trial and error, and experts who are building some intuition why some methods work and others don't.