| I don’t understand why people don’t just use Gemini or some other AI web search to get an answer to these kinds of questions quickly (I excluded the sources, you can get them if you ask the same question). > While AI training is often the most intense and expensive process for a single model, the majority of total AI compute usage (approximately 90%) is used for inference. > Here is the breakdown of why this is the case:
> Inference as High-Volume > Activity: Inference occurs every time a user interacts with an AI model (e.g., asking ChatGPT a question, using image recognition, or generating code). While a model is trained once (or updated infrequently), it runs millions or billions of inferences continuously. > Cost Scaling: Training is a massive, one-time upfront cost, while inference is an ongoing, daily operational cost. As the number of AI users grows, the demand for inference compute scales faster than the need for training new, large models. > The Shift to Efficiency: While early AI hype focused on the immense compute needed for training, the industry has shifted toward making inference cheaper and faster through specialized hardware and techniques like optimization, quantization, and small language models (SLMs). |