Hacker News new | ask | show | jobs
by hedgehog 1720 days ago
One thing to note on the "Train with lower precision" is on newer hardware with TF32 support that gives you much of the speedup of FP16 without being as finicky. Doesn't save memory, but still useful. Automatic in PyTorch, not sure in TensorFlow:

https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-...

This is mostly important because these settings can significantly affect the price/perf evaluation for your specific model & the available hardware.