Hacker News new | ask | show | jobs
by visarga 1254 days ago
True, but scaling has its own problems. It was necessary to find better optimisers, activation functions, regularisers, weight sharing schemes, architectures and many other ingredients to make it work. And to prepare the large datasets, and invent the whole stack of frameworks, from CUDA to HuggingFace.

We have had 250,000 ML papers written since 2012. That's a lower bound on the number of distinct experiments necessary to find the winning tickets of today. Inventing the step-activated neuron formula was less than 1% of the way here.