Hacker News new | ask | show | jobs
by hodgehog11 236 days ago
It's a hyperparameter much like learning rate. If the learning rate is too high, the training process would not work either. Addressing this is just a matter of a grid search.