|
|
|
|
|
by tfirst
5 days ago
|
|
If model performance continues to scale with model size, I have a hard time seeing how local models will have any chance of competing with models hosted on datacenter hardware. 1. There are strong economies of scale in hosting inference (batched prompts, high uptime, shared infrastructure). 2. There are physical limits on how much memory we will be able to produce over the next few years. Demand will probably scale at least as fast as production does, so we won't be saved by falling prices. |
|
2) The current memory crunch is more political than cyclical. The only reason we have fabs as far intro construction as we do is CHIPS Act. Which, predates LLMs public existance by more than 6mo. the horrific silicon prices are a direct result of openAI's openly Illegal dealings. Their pretense of needing it for stargate gets sundered further with each missed or cancelled deadline.
They predicted the political and regulatory outcome superbly.