Y
Hacker News
new
|
ask
|
show
|
jobs
by
The_rationalist
1924 days ago
See also zeroth order backpropagation which allows 300X faster training while not reducing throughput that much
https://arxiv.org/abs/2011.08895
How much zero-3 affect accuracy?
See also
https://github.com/microsoft/fastformers