Even for fp64 it adds only 16 bytes.
RMSPRop, Adagrad have half of this overhead.
SGD has no optimizer overhead of course.