Hacker News new | ask | show | jobs
by ml_hardware 2085 days ago
If I may ask, why are the Inception style workloads still popular, rather than architectures like EfficientNet?

Also, why FP32? CNNs are some of the most robust models to train in FP16 (much easier than language models) so you could get yourself a quick XXX speedup and 2x memory savings by switching over.

(btw not intending to be accusatory or anything, I just think FP16 training deserves a lot more adoption that it currently seems to have :)