Hacker News new | ask | show | jobs
by Legend2440 1040 days ago
Because with infinite hardware I'd be able to do neural architecture search and find the optimal model architecture.

And I'd be able to train a learned optimizer to replace gradient descent as the training process.

Even without either of those, performance improves in a predictable way with more compute thanks to scaling laws.