|
|
|
|
|
by ACCount36
316 days ago
|
|
What? LLMs do benefit from economies of scale. There are a lot of things like MoE sharding or speculative decoding that only begin to make sense to set up and use when you're dealing with a large inference workload targeting a specific model. That's on top of all the usual datacenter economies of scale. The whole thing with "OpenAI is bleeding money, they'll run out any day now" is pure copium. LLM inference is already profitable for every major provider. They just keep pouring money into infrastructure and R&D - because they expect to be able to build more and more capable systems, and sell more and more inference in the future. |
|