Hacker News new | ask | show | jobs
by sognetic 151 days ago
Everything is currently pointing towards inference being the main cost driver for LLMs in the future. Test-time-compute requires huge amounts of tokens in inference and makes providing frontier models as services unprofitable.

Anyone not under some kind of export restrictions can scrounge together some GPUs to train a frontier model (hell, even DeepSeek which is under these restrictions could) but providing a service that can compete with OpenAI et al. will prove to be quite costly. 3x improvements in inference are therefore nothing to sneeze at IMO.