Hacker News new | ask | show | jobs
by arikrahman 5 hours ago
With cache hit rates being effectively free, harnesses like Reasonix have let me do a month of work for less than 2 dollars. It's not even the subsidies making it cheap, American providers like Digital Ocean or Cloudflare host the same model with similar pricing.
3 comments

Cloudflare's Deepseek V4 Pro prices are 4x more than Deepseek's for input and output tokens, and 100x more for cached input tokens, which is crucial for the tool uses of agents which cause multi-turn conversations.
How does caching help here? How much repetition is there in queries?
Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).

This only works because prompt caching is done by matching prefixes, not the entire input.

It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?
I think this is very likely and something that everyone seems to be missing when valuing these AI firms. AI is not the new industrial revolution, it's the new cloud VM: a very useful commodity software offering.