Y
Hacker News
new
|
ask
|
show
|
jobs
by
hodgehog11
236 days ago
It's a hyperparameter much like learning rate. If the learning rate is too high, the training process would not work either. Addressing this is just a matter of a grid search.