Hacker News new | ask | show | jobs
by numlocked 26 days ago
Can you share more? I'm with OpenRouter and we would love to address this! We don't see this in our own testing, I don't believe -- but will share this feedback and dig in.
3 comments

Just try. In a case last week it was ~3x and I tried multiple providers: deepseek, gmicloud/fp8, novita/fp8, and another one I can't remember. It was a large job where at least 2/3rds of the start of the prompts was exactly the same (literally a static string).

Then I read somewhere (I think X) that OpenRouter adds stuff and breaks caching (telemetry? headers? can't remember). So I stopped the job, switched to actual DeepSeek provider, and voilá, caching 3x more tokens per request (on average).

> switched to actual DeepSeek provider

I meant actual DeepSeek API.

Here is some data from my experience using both deepseek v4 flash directly, and deepseek v4 flash via openrouter.

Directly: 135M input tokens - $0.57 (134M cached)

Via OpenRouter 6M tokens - $0.81 (caching stats & inp/out not reported)

Caching is a huge win with using deepseek directly.

I am experiencing this using Opencode. Caching works fine via Deepseek API but not so good via Openrouter