|
|
|
|
|
by johndough
548 days ago
|
|
I tried it to train a CNN-based CIFAR10 classifier, which worked well (only a tiny bit worse than Adam, but the difference might go away with hyper parameter tuning), but the optimizer totally failed (loss -> infinity) when training a U-Net for an image segmentation task. I had to increase eps to 1e-4 and decrease lr to 1e-3 so it would not explode, but that made it very slow to converge. My summary is that the memory savings might be great if it works, but it does not work everywhere. |
|
Adam, on the other hand, generally gets you pretty good results without futzing too much with hyper params.