Hacker News new | ask | show | jobs
by azinman2 18 days ago
First, there’s manyyyy model inference providers out there world wide. Just look at open router. Second, it’s well known in SV that most startups are using Chinese models because they have access to the weights… and that makes it far cheaper.

Why else is Qwen now having cloud-only models?

1 comments

There is plenty of other inference providers, but tell me, who is the cheapest?

Model - Deepseek V4 Pro

CHEAPEST PROVIDER: Provider: Deepseek Input Price - $0.435/M tokens Output Price - $0.87/M tokens Cache Read - $0.003625/M tokens

SECOND CHEAPEST: Provider: deepinfra Input Price - $1.30/M tokens Output Price - $2.60/M tokens Cache Read - $0.10/M tokens

Deepinfra is almost 3x more expensive and they are using a fp4 model, with Max 16.4K output (vs 364K) and have significantly lower throughput!