Hacker News new | ask | show | jobs
by octbash 2336 days ago
Yes, the Reformer is basically trading off noisier for faster training / memory savings.
1 comments

Yep, and I'm not saying its a bad approach! Just trying to answer "why is that any worse than, say, starting with randomly initialized weights in general?" wrt gradient passing

I'm not sure I'd agree with the "noisy" characterization - which to me implies stochasticity-, whereas this is just blocking off the flow of gradient information to save memory.