Hacker News new | ask | show | jobs
by lukewarm707 37 days ago
would be interesting to see some other labs:

- deepseek v4 pro

- glm 5.1

- kimi k2.6

- qwen 3.6 max

- xiaomi 2.5 pro

- minimax 2.7

- grok

1 comments

I agree!

So far we have been native harnessmaxxing, which simplifies things a lot.

The configuration space around open models is much larger. Eg which models, capability heterogeneity, which harness, networking, data egress / privacy, etc.

If anyone is getting very good production code out of open models, I'd love to do a user interview to better understand your setup. Email is in my bio.

With how much vendor harnesses are now actively steering the agent with their own instructions on top of user prompts, I think it’d be super interesting to see a comparison of one of the already tested models - so Opus 4.7 or GPT-5.5 - across a range of different harnesses that aren’t their native. OpenCode, Pi, Hermes, Kilo Code. The most popular coding-focused harnesses, basically.
Agreed. Harness is really important. Especially since many labs are now post-training agents directly in their native harness.

(Which is why my prior is that third party harnesses would not perform as well. But I haven't actually measured this.)

OpenCode seems to give me better results than codex-cli, i’d be interested in seeing this too!