I don't think it shows that training uses more energy than inference over the lifetime of the model - they don't appear to share that ratio.