Hacker News new | ask | show | jobs
by fxtentacle 944 days ago
Thanks for pointing that out :) When I first read the paper, I thought that 4. DIFFERENTIABLE MADDNESS was still part of the 3. BACKGROUND section.

Also, I have to admit that I don't quite understand that section, even after trying a 2nd time. The text implies that Sc would be 15x4 and Hc would be 16x15 but in the illustration it looks like 3x2 and 4x3. I guess I'll have to read Zhang [37] first because like this, I'm not sure what the selection matrix and description matrix do here. That said, (8) and following is easy to understand again. You use the softmax to create an approximately correct gradient but use the hard maximum for calculation the forward pass values.