Hacker News new | ask | show | jobs
by seanmcdirmid 94 days ago
I don’t understand why people don’t just use Gemini or some other AI web search to get an answer to these kinds of questions quickly (I excluded the sources, you can get them if you ask the same question).

> While AI training is often the most intense and expensive process for a single model, the majority of total AI compute usage (approximately 90%) is used for inference.

> Here is the breakdown of why this is the case: > Inference as High-Volume

> Activity: Inference occurs every time a user interacts with an AI model (e.g., asking ChatGPT a question, using image recognition, or generating code). While a model is trained once (or updated infrequently), it runs millions or billions of inferences continuously.

> Cost Scaling: Training is a massive, one-time upfront cost, while inference is an ongoing, daily operational cost. As the number of AI users grows, the demand for inference compute scales faster than the need for training new, large models.

> The Shift to Efficiency: While early AI hype focused on the immense compute needed for training, the industry has shifted toward making inference cheaper and faster through specialized hardware and techniques like optimization, quantization, and small language models (SLMs).

1 comments

Gemini is not a reliable source. You posted the only part of the AI response that isn't useful in verifying whether it is true.
Sure, I guess. I asked Gemini to give me some markdown of citations and the claims made that address the question:

https://share.google/aimode/v3Y9P3rYIx1oj9VI2

And I finally figured out how to get links to answers instead of just inlining the content as before. Anyways, there it is. We live in a time where questions like "Does inference or training use more compute?" can be answered quickly by just pasting it into a search box.