I can definitely imagine they're not covering the amortised cost of the training with the cost per individual inference request. It seems less likely to me that they're making a significant loss on each subsequent request, but again no source from me on that either.
Looking a bit more into this, I found this paper: https://arxiv.org/pdf/2311.16863.pdf. It references a table saying that text generation uses 0.047 kWh per 1000 inferences, which is 1-2 orders of magnitude lower than my estimate. Though that is for GPT2, so possibly tracks to something roughly in the ~0.001 kWh per inference for GPT3.5.
I'm not sure. The figures I've seen suggest that GPT3 required 10x more energy to train than GPT2 (e.g. https://www.nnlabs.org/power-requirements-of-large-language-....), so I think a roughly 1-2 order of magnitude increase in energy usage from GPT2 to GPT3.5 makes sense.
Looking a bit more into this, I found this paper: https://arxiv.org/pdf/2311.16863.pdf. It references a table saying that text generation uses 0.047 kWh per 1000 inferences, which is 1-2 orders of magnitude lower than my estimate. Though that is for GPT2, so possibly tracks to something roughly in the ~0.001 kWh per inference for GPT3.5.