|
|
|
|
|
by lompad
159 days ago
|
|
>But inference costs are dropping dramatically over time, Please prove this statement, so far there is no indication that this is actually true - the opposite seems to be the case. Here are some actual numbers [0] (and whether you like Ed or not, his sources have so far always been extremely reliable.) There is a reason the AI companies don't ever talk about their inference costs. They boast with everything they can find, but inference... not. [0]: https://www.wheresyoured.at/oai_docs/ |
|
Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.
However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.
The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.
[1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct