Hacker News new | ask | show | jobs
by mrbonner 50 days ago
Your method of combining models to strengthen the implementation reminds me of how we form stronger alloys by combining metals!
1 comments

it also sounds like a lot to manage, do you have some sort of agentic framework that's treating all of these llm's you have access to as sort of inputs that it optimizes?
Unfortunately not. I'm using plain kimi, opencode (with deepseek, gpt, minmax, whatever) and claude. claude is the best, but only for some hours. The trick is to get a good AGENTS.md file, good test cases and test runner to repro, like seemless docker and qemu calls. GNU autotools would be easiest, but here I'm using plain makefiles. Also for LSP clangd being up-to-date a compile_commands.json is important. git worktrees helped developing the arm port and fixing c-testsuite cases in parallel. I wanted to keep the costs down. About $15-$30 I think.

And for low-level problems, like ARM calling-convention in asm, those models are much better than simple algorithmic python problems. Just for the hardest problem I needed the big expensive gun, but never opus. This helps in deciding what to do with my next jit project.

Not op but I wrote llm-consortium to prompt multiple models and create a synthesis. And it can run on an openai endpoint using llm-model-gateway. It's expensive, naturally, but for situations where you absolutely must get max intelligence its hard to beat.

e.g.

  Pelican Riding a Bicycle — Engineering Study by DeepSeek v4 Pro, Kimi K2.6, and GLM-5.1 (1 iteration in synthesis mode with DeepSeek v4 flash as judge)
https://htmlpreview.github.io/?https://gist.githubuserconten...