|
|
|
|
|
by yorwba
236 days ago
|
|
One problem with testing one change at a time is that if you can only run a small number of experiments because each one requires many GPU hours to get results, you can also only test a small number of changes. If you can come up with and implement new changes much more easily than you can test them, it would be more efficient to test multiple changes at a time and use some form of Bayesian optimization to find the best combination of changes with as few experiments as possible. |
|
Or, more modern Bayesian methods if you're more interested in getting the best results for a given hyperparameter sweep.
However, that is not to detract from the excellent effort made here and the great science being investigated. Write ups like this offer so much gold to the community.