Hacker News new | ask | show | jobs
by rich_sasha 1738 days ago
There is something called the golden ticket theory (maybe mentioned in the paper, I’m on my phone), that says indeed that the large models are effectively ensembles of massive random models, and the top levels of the network pick the one or two that randomly happen to work.

Maybe true but even then only part of the story, kernels in CNN genuinely seem to learn features like edges and textures.