Hacker News new | ask | show | jobs
by rdedev 903 days ago
Ah my bad. I am using mixed precision training in the my previous comment.

You might find this paper interesting: https://arxiv.org/pdf/2010.06192.pdf