|
|
|
|
|
by lancebeet
94 days ago
|
|
This will sound snarky, so forgive me, but I honestly don't know the answer. Is this actually true? Is there a reliable source containing statistics on LLM compute usage that includes training vs inference for the whole market? |
|
> While AI training is often the most intense and expensive process for a single model, the majority of total AI compute usage (approximately 90%) is used for inference.
> Here is the breakdown of why this is the case: > Inference as High-Volume
> Activity: Inference occurs every time a user interacts with an AI model (e.g., asking ChatGPT a question, using image recognition, or generating code). While a model is trained once (or updated infrequently), it runs millions or billions of inferences continuously.
> Cost Scaling: Training is a massive, one-time upfront cost, while inference is an ongoing, daily operational cost. As the number of AI users grows, the demand for inference compute scales faster than the need for training new, large models.
> The Shift to Efficiency: While early AI hype focused on the immense compute needed for training, the industry has shifted toward making inference cheaper and faster through specialized hardware and techniques like optimization, quantization, and small language models (SLMs).