I'm not sure. The figures I've seen suggest that GPT3 required 10x more energy to train than GPT2 (e.g. https://www.nnlabs.org/power-requirements-of-large-language-....), so I think a roughly 1-2 order of magnitude increase in energy usage from GPT2 to GPT3.5 makes sense.