Hacker News new | ask | show | jobs
by mccoyb 67 days ago
I'm suspicious that this is going to lead to optimal orchestration ... or rather, that open source won't produce a far better alternative in time.

The best performance I've gotten is by mixing agents from different companies. Unless there is a "winner take all" agent (I seriously doubt it, based on the dynamics and cost of collecting high quality RL data), I think the best orchestration systems are going to involve mixing agents.

Here, it's not about the planner, it's about the workers. Some agents are just better at certain things than others.

For instance, Opus 4.6 on max does not hold a candle to GPT 5.4 xhigh in terms of bug finding. It's just not even a comparison, iykyk.

Almost analogous to how diversity of thought can improve the robustness of the outcomes in real world teams. The same thing seems to be true in mixture-of-agent-distributions space.

5 comments

I'm fairly certain these AI companies are lobsters in a bucket. Every time one of them products a private model, they'll all use access to that model to generate improvements _and then publish those improvements_ as a way to hamstring the cornering of that market.

So, that'll go on until they form a cartel and become the wizard of oz.

Another way to think about it:

For Anthropic to have the best version of this software, they'd have to simultaneously ... well, have the best version of the software, but also beat every other AI company at all subtasks (like: technical writing, diagramming, bug finding -- they'd need to have the unequivocal "best model" in all categories).

Surely their version is not going to allow you to e.g. invoke Codex or what have you as part of their stack.

They would also have to prevent all access from the model being used to beat the model..
I think opus does in fact, find the bugs the same way GPT xhigh (or even high) does. It just discards them before presenting to the user.

Opus is designed to be lazy, corner-cutting model. Reviews are just one place where this shows. In my orchestration loop, opus discards many findings by GPT 5.4 xhigh, justifying this as pragmatism. Opus YAGNIs everything, GPT wants you to consider seismic events in your todo list app. There's sadly, nothing in between.

My fear is that this is going to lead to an optimal orchestration language. For example, that Claude switches to Sumerian for all communication between agents. One thing is if they try to silo like that, but my real fear is that it may actually perform well.

(Not sure if it would be Sumerian, Esperanto or something more artificial. As long as it is esoteric enough for one company to hoard all the expertise in it.)

I've seen Antigravity outputting chinese characters in its thinking traces from time-to-time.

I also remember chinese being discussed as a potential orchestrating language but I don't remember the sources, so 100% anecdotical.

Yeah this has been my experience too, mixing agents/models from different companies..

Having Opus write a spec, then send to Gemini to revise, back to Opus to fix, then to me to read and approve..

Send to a local model like Qwen3.5 to build, then off to Opus to review ...

This was such an amazing flow, until Anthropic decided to change their minds.

This is still very much doable. This is exactly how I'm working. I'm using opencode with a mixture-of-agents I built (https://github.com/tessellate-digital/notion-agent-hive), where the model behind each agent is configurable.
You can still do all of this. With tmux. Nothing anthropic can do about that.

Gemini cli is horrible though.