Hacker News new | ask | show | jobs
by The_rationalist 1924 days ago
See also zeroth order backpropagation which allows 300X faster training while not reducing throughput that much https://arxiv.org/abs/2011.08895 How much zero-3 affect accuracy?

See also https://github.com/microsoft/fastformers