Hacker News new | ask | show | jobs
by dev_tools_lab 90 days ago
Nice work on the scheduler. Have you benchmarked parallel inference across multiple models? Running GPT, Claude and Gemini simultaneously on the same input is where latency becomes a real constraint.
1 comments

GPT-OSS exists but Claude and Gemini aren't available locally, lol.
True, Claude and Gemini aren’t local yet — I mostly meant running all available local models in parallel.

Even with just open-source LLMs, you can see interesting differences in flagged issues when cross-validating outputs.