|
> Even that $1/Mtok provided by Together AI is heavily subsidized Can you link this? I'm unable to find them offering deepseek-v4-flash. I think you could even host the pro model for a bit under $1/Mtok. You can get ~1000TPS out of the box on a B300 that you can rent for ~$3/hr, so around $0.83/Mtok. Regardless - Alibaba, DeepSeek, NovitaAI, AtlasCloud, Cloudflare, DeepInfra, SiliconFlow, GMICloud, Morph, Baidu, Parasail, DigitalOcean, AkashML, StreamLake and likely others all seem to be offering it under $0.3 per million output tokens[0]. > This makes it unclear how the true cost curve is progressing For no actual improvement in efficiency to be presented as a 10X yearly improvement since 2018, we'd need to currently be getting 100000000X more intensive models than we should be for what we're paying (a $1/Mtok model actually costs $100000000/Mtok). Presenting, say, a 9X actual yearly improvement as a 10X yearly improvement seems feasible, but for much beyond that I think the exponential just compounds too fast to reasonably fake. [0]: https://openrouter.ai/deepseek/deepseek-v4-flash#pricing |