Hacker News new | ask | show | jobs
by yorwba 2908 days ago
From the article:

So, weight decay is always better than L2 regularization with Adam then? We haven’t found a situation where it’s significantly worse, but for either a transfer-learning problem (e.g. fine-tuning Resnet50 on Stanford cars) or RNNs, it didn’t give better results.