|
|
|
|
|
by ipunchghosts
387 days ago
|
|
I am glad they evaluated this hypothesis using weight decay which is primarily thought of to induce a structured representation. My first thought was that the entire paper was useless if they didn't do this experiment. I find it rather interesting that the structured representations go from sparse to full to sparse as a function of layer depth. I have noticed that applying weight decay penalty as an exponential function of layer depth gives improved results over using a global weight decay. |
|