|
|
|
|
|
by azinman2
18 days ago
|
|
First, there’s manyyyy model inference providers out there world wide. Just look at open router. Second, it’s well known in SV that most startups are using Chinese models because they have access to the weights… and that makes it far cheaper. Why else is Qwen now having cloud-only models? |
|
Model - Deepseek V4 Pro
CHEAPEST PROVIDER: Provider: Deepseek Input Price - $0.435/M tokens Output Price - $0.87/M tokens Cache Read - $0.003625/M tokens
SECOND CHEAPEST: Provider: deepinfra Input Price - $1.30/M tokens Output Price - $2.60/M tokens Cache Read - $0.10/M tokens
Deepinfra is almost 3x more expensive and they are using a fp4 model, with Max 16.4K output (vs 364K) and have significantly lower throughput!