|
|
|
|
|
by WithinReason
9 days ago
|
|
The gradient info can be compressed 10000x with the right tricks, I think it is achievable. Nous claims they did it already: https://github.com/NousResearch/DisTrO There are other gradient compression papers from the past reporting large compression rates |
|