Even for fp64 it adds only 16 bytes.
RMSPRop, Adagrad have half of this overhead.
SGD has no optimizer overhead of course.
Even for fp64 it adds only 16 bytes.
RMSPRop, Adagrad have half of this overhead.
SGD has no optimizer overhead of course.