Y
Hacker News
new
|
ask
|
show
|
jobs
by
hantusk
830 days ago
Digging into the low rank structure of the gradients, instead of the weights seems like a promising direction for training from scratch with less memory requirements:
https://twitter.com/AnimaAnandkumar/status/17656138151468933...
1 comments
hantusk
830 days ago
Simo linked some older papers with this same idea:
https://twitter.com/cloneofsimo/status/1765796493955674286
link