|
|
|
|
|
by zingelshuher
804 days ago
|
|
There are unstable cases when static learning rate doesn't work. Solution starts wobbling too much after some time and explodes. Using too small LR from the beginning leads to local minima. Making it stable _is_ possible, but it's a different story. |
|
Here's another person in stack exchange who figured this out: https://stackoverflow.com/a/44844544
Pytorch and TG both use a default 1e-8.