|
|
|
|
|
by Legend2440
1040 days ago
|
|
Because with infinite hardware I'd be able to do neural architecture search and find the optimal model architecture. And I'd be able to train a learned optimizer to replace gradient descent as the training process. Even without either of those, performance improves in a predictable way with more compute thanks to scaling laws. |
|