| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arikrahman 5 hours ago
	With cache hit rates being effectively free, harnesses like Reasonix have let me do a month of work for less than 2 dollars. It's not even the subsidies making it cheap, American providers like Digital Ocean or Cloudflare host the same model with similar pricing.

3 comments

Scaevolus 3 hours ago

Cloudflare's Deepseek V4 Pro prices are 4x more than Deepseek's for input and output tokens, and 100x more for cached input tokens, which is crucial for the tool uses of agents which cause multi-turn conversations.

link

pjc50 2 hours ago

How does caching help here? How much repetition is there in queries?

link

jcparkyn 1 hour ago

Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).

This only works because prompt caching is done by matching prefixes, not the entire input.

link

AnthonyMouse 2 hours ago

It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?

link

ForHackernews 3 hours ago

I think this is very likely and something that everyone seems to be missing when valuing these AI firms. AI is not the new industrial revolution, it's the new cloud VM: a very useful commodity software offering.

link