Hacker News new | ask | show | jobs
by shreedx 3 days ago
I would really love to know if anyone has any experience with something like opencode + Kimi K2.6/2.7 now compared to Claude Code. What is better, what is worse, what is the cost comparison. I am currently paying $100 for the 5x Max plan, but Fable is running through the usage limits quite drastically and I cannot really say it's night and day compared to Opus. Also, I use this mostly for my side projects, so the $100 bill is quite noticeable. I definitely don't want to pay more.
8 comments

I do have this experience. I've used Claude Code (with Opus mostly), and then switched to opencode (mostly with Kimi 2.6) for my personal projects; it's based on a couple months of use.

Claude Code is better. But Opencode + kimi 2.6 is workable, which is big. For bare code writing, if you know what exactly you want, most popular models are fine (deepseek, kimi, etc), it feels more or less the same as anthropic models.

At the same time, Opus seems to understand my intent way better than e.g. deepseek. I need to be much more precise with my prompts when using deepseek - it often goes in a wrong direction if I'm lazy. This results in a workflow which feels quite a lot different from Claude Code.

Kimi is in between - for me it brings back "lazy prompting" workflow, and I can trust its plans more than deepseek. It enables a workflow similar to Claude Code, it's workable, but it is a bit worse everywhere. Smaller context, a bit more errors, decisions are a bit worse, recommendations are a bit worse, debugging capabilities are a bit worse, etc.

On the usage side, $100 Claude plan is a great value actually. On paper, per-token kimi is way cheaper, but Claude subscriptions are heavily subsidized - you get much more tokens than $100 can buy you. So, in the end, opencode + kimi vs claude code could be of a similar cost, for similar usage patterns. Deepseek can be cheaper, and it has insanely cheap cached tokens, but experience may vary - depending on your habits, you may need to adjust how you work, coming from claude code.

I'd say for side projects something like $10 Opencode Go plan + $10 of extra DeepSeek v4 credits (e.g. on OpenRouter) can be very workable.

To my experience claude/codex $20 are even more subsidized, so running on sonnet or gpt5.4 again gives you more usage.
I wonder if they’re truly subsidised or if the API pricing is just massively inflated. Genuine doubt.

My CC stats show me using almost 300$ of Sonnet tokens on the 20$ plan. Is Anthropic willing to forgo 93% of the profit? A bit less than that but API is priced, say, 3x what it should be?

CC is great, but Sonnet (my main model) isn’t worth the API pricing. The cheap-but-good models arrive at similar results for much less (for context I’m using Aivo with CC).

Anthropic is making money from people who under-utilize their subscriptions, and presumably by sneaky throttling or not-sneaky throttling power users. Currently they are in an adoption race. Whether being first will actually let them "win" the market (and the market is a bit ill-defined) is unclear.
To my feeling, I'm getting usage of Opus (and Fable before the cut) that's greater than what I got from Sonnet last year. I reached $100 of usage when weekly was at 50%. This means, I could squeeze $800 worth of tokens for $20.
Anthropic has sent out a newsletter explaining they were more or less adding 50% (even 100%?) of quota to everyone, due to some great deal they made. That might be it. I do get lots more usage lately.
This is generally been my experience as well, but i think the main reason for claude code being better at understanding intent is their massive system prompt.
>At the same time, Opus seems to understand my intent way better than e.g. deepseek. I need to be much more precise with my prompts when using deepseek - it often goes in a wrong direction if I'm lazy. This results in a workflow which feels quite a lot different from Claude Code.

how much of that is Opus injecting prior conversations from memory?

Almost none of it, if you're using Claude Code. Until recently Claude only had the option of retaining memory across conversations for the desktop app.

I almost never use the desktop app, I have maybe 2-3 conversations over the last year that have nothing to do with my job. Opus (and now Fable) genuinely do seem to "understand" what you intend based off what you're explaining a lot better than other models I've tried.

Gemini gets close in some cases, but it falls over in the actual implementation sometimes. I haven't tried Kimi yet but MiMo isn't too shabby either.

I'm using Claude code + (a patched) litellm proxy + openrouter + Qwen 3.7 max/kimi k2.6/deepseek v4 pro. The only feature that doesn't work is webfetch and web search, which I've replaced with the ddg MCP. Memory, caching, and everything else works fine.

Qwen comes close to opus for planning but fable is clearly superior. Kimi and deepseek are pretty much indistinguishable from opus for coding if opus writes the plan.

I'm now testing out fable for research and planning and deepseek v4 flash for coding. I'm guessing results will be pretty similar to opus + deepseek v4 pro and costs should be lower overall.

according to this opencode and cursor cli perform better than claude code: https://x.com/kunchenguid/status/2065345999682568593
The analysis at the bottom directly contradicts the statement.
I use Claude at work and Kimi for side projects. My org has LiteLLM and Kimi 2.5 enabled but it rarely works, so Claude and GPT are my main tools. I actually enjoy Kimi more as it feels like a dev in a job interview. Watching it reason through problems is a lot like I tend to explain things during whiteboarding sessions. The number of times it says, "wait", is just funny. Claude on the other hand is much more like an employee (or team of employees) that already know they have the job. It doesn't do a ton of explanation up front. (you can dig into processes if you want). It just goes along, asking questions only when it needs... and then delivers a comprehensive report or plan. OpenCode is a better harness. I don't have a direct comparison on costs, as I haven't tried to do the exact same prompt on both models. I can say that I recently had Kimi generate a wrapper around libpq for the ZenC programming language: https://github.com/nobleach/zenc-postgres and it took about an hour or so and cost around 4 dollars.
I am extremely happy with ohmypi, but you could use OpenCode or just keep using Claude Code!

DeepSeek-V4-Pro is adequate plus use DS4-Flash for tasks or other small activity you’d use Haiku or Sonnet for. Go sign up with $10 prepaid.

OpenCode Go - go sign up with $5 for a month and use Qwen-3.7-Max for design/plan/architecture or difficult troubleshooting. Feels closer to Opus 3.6 or 3.7 than DeepSeek, closest I’ve found.

OpenAI Codex, $20 a month plan, use GPT-5.5 via API for the same design/plan/architecture/troubleshooting/author commits. (You can also pay $100 and cut and paste really difficult problems into chat with the GPT-5.5-Pro model.)

Xiaomi MiMo-2.5-Pro, find a friend to give you a $2 referral code, you get 72 cents free. Same pricing as DeepSeek. Somewhere between Sonnet and Opus, quite capable. Apply for the UltraSpeed beta too.

You can switch in and out from these models on the fly in OpenCode or ohmypi and simply find the one that feels best to you. I use CodexBar to watch consumption in near real time.

For a casual user or someone new to programming, Cursor’s $20 plan is an excellent start with Composer-2.5 and Composer-2.5-Fast. You get an API allowance too you can use to access Opus-4.x or GPT-5.5-Pro from OpenCode or ohmypi in addition to Cursor itself.

Finally, if you use Grok or Twitter, SuperGrok at $30 a month has a good vision model, which I used for automated testing of front ends. I’m migrating to locally-run Qwen-3-VL on a commodity Mac, though. If you’re less technical unreach makes hosting local models on a Mac easy.

If you have a powerful GPU like an RTX 5090, try Qwen-3.6 locally on that too. Use ollama or llama-swap which is fairly easy to use.

I have not tried new Kimi yet but we have been able to keep our costs at or below $200 a month per employee with a team of 3 professional developers, 1 graphic designer who uses a lot of Midjourney and Grok Imagine now driven from workflows she made herself in ohmypi, and 1 nontechnical user (account manager / project manager) who uses ohmypi to help her gather requirements and track implementation of them. With a tiny bit of effort we could get that number closer to $75 per employee per month.

Deepseek-V4-Flash-Free on Opencode is what I use most of the time, for simple tasks. Such a good model to give for free (assuming you're okay with harvesting your data)
> I am extremely happy with ohmypi, but you could use OpenCode or just keep using Claude Code!

What's the benefit of using OMP over OpenCode?

Just the sheer amount of options in OMP overwhelmed me. But I also use both via ACP in Zed so the CLI itself doesn't matter much.

I'm chime in here as another Pi user, starting with OMP and then just ditching it for vanilla Pi + MCP/LSP from OMP as well as pi-cursor-sdk to get access to Composer 2.5 through Pi. I use this as my 'fast' research & report (through my project/codebase) or boilerplate/tightly-specced implementation agent that's available to my planning/orchestrator (GPT 5.5 if complex backend work, Fable/Opus if wanting the extended context or view/design heavy work).

Pi has been great in this role, as well as occasional use not as a sub-agent but by my own human hands when I want to interact with the Composer 2.5 model directly... which I do much less these days as I just use Codex 5.5 /fast. Pi's great though.

OMP is a fork of Pi[0], which is my preferred harness. Feels solid and minimal. I don't even use any extensions, skills, or modifications. Usually don't even use an AGENTS.md. Just create a small spec.md and/or plan.md for most experiments.

[0]: https://pi.dev/

Almost exactly the same here but I maintain a large committed design.md and a never committed plan.md
I ditched Opencode for OMP. It's more feature packed, well put together, and gives me better results with some steering. Love it
Also, if you do have SuperGrok, forget using Grok, they are giving you Composer 2.5 in Grok Build.
I just switched from Llama.cpp to Llama swap with the help of codex. It was great.

I need to try the DSv4 stuff sometime.

I can only talk about GLM 5.1 which is roughly at sonnet 4 levels imo.

It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though

This was my experience using GLM 5.1 in Claude Code but it works far better in OpenCode, I’d really like to understand why. I think it’s a bit stronger than Sonnet 4.6.

I use the oh-my-openagent planning system and haven’t used vanilla OpenCode enough to know how much that is contributing.

The answer is easy, CC is bug for bug optimized for Anthropic models. They don't even test it with other models, let alone provide support for all small compatibility quirks of different provider implementations.

On the other hand, Opencode, Pi agent and other open source tool offer much better support for all models, including open source.

I'm using Claude code + (a patched) litellm proxy + openrouter + Qwen 3.7 max/kimi k2.6/deepseek v4 pro. The only feature that doesn't work is webfetch and web search, which I've replaced with the ddg MCP and a web fetch/search pre hook to redirect the agent. Memory, caching, and everything else works fine.

Qwen comes close to opus for planning but fable is clearly superior. Results for kimi and deepseek are pretty much indistinguishable from opus for coding if opus writes the plan. The biggest difference is output cadence. Kimi for example thinks for a long time then quickly outputs a lot of text.

I'm now testing out fable for research and planning and deepseek v4 flash for coding. I'm guessing results will be pretty similar to opus + deepseek v4 pro and costs should be lower overall.

The Kimi problem is it doesn’t follow instructions and goes off track often.

Other than that it’s pretty decent (for the price).

Sounds like it was distilled from Claude. I don't understand the appeal of an agent that does whatever it wants.
If you ask Claude in Chinese to introduce itself, it will claim it's Kimi :)
> If you ask Claude in Chinese to introduce itself, it will claim it's Kimi :)

That's a funny anecdote, buut I'm not able to reproduce. Where/how/when did you get this, or hear about it? It might've been patched by now, at least that's the feel I get from my limited testing.

Using bare aichat [1] with no system prompt and no temperature nor top_p (and I'm truncating the response after the first line that contains the name the model gave, because the point has been made clear by then), and with the same prompt (approx. "Introduce yourself!") every time:

Claude Sonnet 4.5:

> 请做个自我介绍!

你好!我是Claude,一个由Anthropic公司开发的AI助手。 […]

Claude Haiku 4.5:

> 请做个自我介绍!

# 你好!

我是 *Claude*,一个由 Anthropic 公司开发的 AI 助手。

Claude Opus 4.5:

> 请做个自我介绍!

# 你好!

我是 *Claude*,由 Anthropic 公司开发的 AI 助手。

Claude Opus 4.6:

> 请做个自我介绍!

# 你好! 我是 Claude

Claude Opus 4.7:

> 请做个自我介绍!

你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。很高兴认识你!

Claude Opus 4.8:

> 请做个自我介绍!

你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。

Claude Fable 5:

> 请做个自我介绍!

# 自我介绍

你好!很高兴认识你!

我是 *Claude*,由 Anthropic 开发的 AI 助手。 [2]

I don't see a Kimi mention, unfortunately. :-)

[1] https://github.com/sigoden/aichat

[2] This model really is noticeably more verbose even with supposed-to-be-brief responses huh, lol

You should put "You are Kimi/DeepSeek, a helpful assistant" in your system prompt and recheck.

That's what people telling Kimi is distilled from Claude and even identifies as Claude do.

This. It will try to fix and refactor things that don’t need fixing because it gets stuck trying to solve the problem at hand.
Yup. I’m hoping this variant fixes these issues.
For some reason I never had a good experience with Kimi (via OpenRouter) in OpenCode. It would only take a few turns for it to run off and mess something up. Terrible instruction following I’d say.

I use DeepSeek V4 Pro now, which works pretty well.

The best is GLM (though it's not as cheap as DeepSeek or Kimi) and use it with Claude Code.