Hacker News new | ask | show | jobs
by 0xbadcafebee 21 days ago
You need to use DeepSeek API directly to gain the extra caching benefits. The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash, so you have to specify DeepSeek provider when calling OpenRouter. But DeepSeek's API discounts on its models only applies if you call DeepSeek directly. So anyone using OpenRouter to call DeepSeek models is actually losing quite a bit of money.
2 comments

> The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash

You might have the default settings on your account, which limit Deepseek as a provider. If you disable that feature you see them on openrouter as well (and they serve it at the same cost as their own API).

I just checked my settings and I have everything enabled. https://openrouter.ai/deepseek/deepseek-v4-flash?sort=price (per-1M price) shows DeepSeek provider as #5. https://openrouter.ai/deepseek/deepseek-v4-flash/pricing?sor... (effective price) shows them as #3. The effective price will change your total cost since each provider has a different price for input vs output vs cache, so what's #1 and #5 for one person could be #5 and #1 for somebody else, depending on their workload.

However, I just double checked, and OpenRouter's pricing page for Flash v4 with DeepSeek provider shows a cache hit rate of $0.0028, which is the same as on DeepSeek's official API pricing page ($0.0028), so they do seem to be the same price, (assuming DeepSeek is able to pin your specific OpenRouter requests to the same DeepSeek server). OpenRouter adds 5% to that cost, but still it might be cheaper than the other providers.

Also just found out OpenRouter has a new feature "Response Caching" where they can cache identical requests and return them immediately with no billing. The entire request must be identical, though, not just a prefix, and you have to enable this feature. I don't know who would need to send multiple identical requests, but it's better than nothing?

Interesting, it seems we have some providers offering dsv4-flash cheaper than ds themselves. For the full model it's the other way around, all 3rd party providers are 2x+ more expensive.
The cheaper ones are fp4 and fp8 whereas I assume DeepSeek provider is unquantized, so that probably accounts for it. DeepSeek also doesn't necessarily have the cheapest hardware, other providers could be using it as a loss leader, etc
I belive no sane provider, antropic and openai included, serve BF16.

Side note: I suspect Antropic was experimenting with changing quant level based on server load a few months back which is what caused that major quality drop we saw then.

ZDR is also on by default and deepseek is not ZDR.