So basically, Hy3 is the cheapest decent model on OpenRouter, unless you use DeepSeek as the provider for DeepSeek V4 Flash, in which case DeepSeek's insane caching wins out. (And Hy3 is close-ish on the benchmarks.)
You need to use DeepSeek API directly to gain the extra caching benefits. The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash, so you have to specify DeepSeek provider when calling OpenRouter. But DeepSeek's API discounts on its models only applies if you call DeepSeek directly. So anyone using OpenRouter to call DeepSeek models is actually losing quite a bit of money.
> The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash
You might have the default settings on your account, which limit Deepseek as a provider. If you disable that feature you see them on openrouter as well (and they serve it at the same cost as their own API).
However, I just double checked, and OpenRouter's pricing page for Flash v4 with DeepSeek provider shows a cache hit rate of $0.0028, which is the same as on DeepSeek's official API pricing page ($0.0028), so they do seem to be the same price, (assuming DeepSeek is able to pin your specific OpenRouter requests to the same DeepSeek server). OpenRouter adds 5% to that cost, but still it might be cheaper than the other providers.
Also just found out OpenRouter has a new feature "Response Caching" where they can cache identical requests and return them immediately with no billing. The entire request must be identical, though, not just a prefix, and you have to enable this feature. I don't know who would need to send multiple identical requests, but it's better than nothing?
Interesting, it seems we have some providers offering dsv4-flash cheaper than ds themselves. For the full model it's the other way around, all 3rd party providers are 2x+ more expensive.
The cheaper ones are fp4 and fp8 whereas I assume DeepSeek provider is unquantized, so that probably accounts for it. DeepSeek also doesn't necessarily have the cheapest hardware, other providers could be using it as a loss leader, etc
I belive no sane provider, antropic and openai included, serve BF16.
Side note: I suspect Antropic was experimenting with changing quant level based on server load a few months back which is what caused that major quality drop we saw then.