| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yorwba 2908 days ago
	From the article: So, weight decay is always better than L2 regularization with Adam then? We haven’t found a situation where it’s significantly worse, but for either a transfer-learning problem (e.g. fine-tuning Resnet50 on Stanford cars) or RNNs, it didn’t give better results.