Hacker News new | ask | show | jobs
by hantusk 830 days ago
Digging into the low rank structure of the gradients, instead of the weights seems like a promising direction for training from scratch with less memory requirements: https://twitter.com/AnimaAnandkumar/status/17656138151468933...
1 comments

Simo linked some older papers with this same idea: https://twitter.com/cloneofsimo/status/1765796493955674286