Hacker News new | ask | show | jobs
by android521 183 days ago
is end to end speech model like openai real time /gemini live or open source qwen 3 omni better in terms of latency?
1 comments

There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.