Hacker News new | ask | show | jobs
by declan_roberts 32 days ago
I love the focus on cache hit efficiency. Hats off to the deekseek team for creating a great product that maximizes cost efficiency for the user.
4 comments

> Hats off to the deekseek team for creating a great product

I have been using it for a while, and I wholeheartedly agree. imo, it is as good as codex or claude which I also use. It is a winner in the cost-sensitive tier, and if some startup could put it together with data-retention in mind, it could be a great product sold to the enterprise, as data-retention and privacy are the main issues for the coding-assistant usecase.

Deepseek v4 pro is definitely my preferred cheap model, it's very good, and I use it all the time for my personal projects (opencode go plan), but I also use Claude Opus all the time at work and Deepseek is not as good as that, but it does compete with Sonnet for capability, and beats it on price.
I have unlimited Claude Opus at work and it’s wonderful. Not allozwed to use it for personal use though.

So I use Deepseek Pro on the $20 Ollama Cloud plan and it’s really not that far behind and I never triggered the plan’s limits.

It’s like 10-15% less powerful but costs 10 times less.

Totally worth it. I prefer Opus because my employer pays for it but I would personally never pay 10 times more for it.

Nice,

I have got unlimited Claude Opus at work as well.

I was really having a hard time deciding between the Ollama and OpenCode plans for personal use, I couldn't really understand how much usage I would get with the Ollama plan, so in the end I went with OpenCode and I have never hit the limits despite using it most evenings and weekends for several hours.

What models do you use in open code? I too have unlimited opus at work and I tried using my same workflow from work using Kimi 2.6 in open code and... It's just not it, even for relatively simple stuff.

Maybe I should try DS4p?

I use DeepSeek v4 Pro, at max thinking. It's comparable to Sonnet 4.6 on high thinking.
I genuinely don’t think you need Opus 4.7/GPT-5.5 tier models for 95% of tasks in a normal workplace

People are out there using frontier intelligence to make responsive headers and weekly work reports. Absolutely don’t need the latest and greatest models for this stuff

Deepseek V4 Pro is an amazing model, even without the unreal cost factored in.

It is my default model at the moment. I'm not doing anything too complex though. I honestly found more expensive models like Qwen 3.6 to fail in tasks Deepseek nails.

I'm interested in knowing what people are using for tasks which require a bit more thinking. Kimi 2.6? Qwen 3.7? GLM 5.1?

I don't think there's any open models at the moment that can handle the more challenging stuff.

The things that I use Opus for at work is finding bugs in about ~200k lines of microservices and libraries in a niche language. So, we will get these bug reports that are missing context, can't easily be reproduced on our dev server, and are usually the result of something deep in multiple services/libraries combining with very custom configs. I can ask Opus (max thinking) to find what could cause the bug, and it usually nails it in a few hours (would take me 1-2 weeks to trace it myself). The end result will be like less than 10 lines of code to fix it, some tests to reproduce the bug and a nice report explaining it, so it can be checked in an hour or two.

17 GoLang microservices for a serious project were written perfectly using the latest version of QWEN(3.6). The only areas where we really had to work hard were documentation and a very serious task breakdown. All of this was tested, and yes, a review was required, but everything was within reason. The deadline was 10 days of 24/7 work, including the review. When attempting to submit the same task, Opus 4.7/4.6 had to be stopped after three hours. If you have significant resources for experimentation, you can certainly try. For us, the choice is absolutely clear at this point.
Just in case, note that this project is someone's side project

> Independent open-source project · not affiliated with DeepSeek

How can you have cache hit efficiency? Isn't it just a matter of not changing the previous context? I don't understand what knobs there are to tweak on this.
> Isn't it just a matter of not changing the previous context?

Yes, but a lot of harnesses change previous context. E.g. the system prompt injects the current time/date, working directory, files in the working directory, etc. Compaction also changes the whole previous context. I _think_ changing the list of tools also invalidates cache, so invoking a subagent with different tools would invalidate the cache.

My vague impression is that it's in a similar vein to functional programming languages. It generally disallows doing things that lead to bugs (cache misses in this case), and presumably allows you to do those things in a way that makes it much clearer that this is likely to cause cache misses. I would guess that in this paradigm, you don't mutate your existing session, you derive a new session by mutating the prior context into a new context.

changing between plan/build mode in some agents will change the tools list, which breaks the cache.
Cache is always there, it’s just that it only caches up to the point where an input token changes. So if the tools list is early in the prompt, changing it would limit cache for most of the prompt. If the tools list is the last thing, you could still get 99% cache hits even if it changes every turn.
After a couple of turns the system prompt is a small part of the context. Not changing the system prompt at all is key so that the rest of the history is itself part of the prefix.
Depends upon the service and how the harness is built, Some of the services allow for very few cache keys, so you won't necessarily get any cache if you edit recent messages as the cache is not per message, but big blocks of everything up to a cache key.

This was actually surprising to me when I learned about it as I have never worked with (or built) any cache working like that before.

Adding already cheap API cost and you probably could let it run for days and the same task..