| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ipunchghosts 387 days ago
	I am glad they evaluated this hypothesis using weight decay which is primarily thought of to induce a structured representation. My first thought was that the entire paper was useless if they didn't do this experiment. I find it rather interesting that the structured representations go from sparse to full to sparse as a function of layer depth. I have noticed that applying weight decay penalty as an exponential function of layer depth gives improved results over using a global weight decay.