Hacker News new | ask | show | jobs
by ej88 38 days ago
An omni model seems very useful for real-time human-computer interaction, off the top of my head:

- Voice assistants

- Customer experience

- Gaming

- Meeting assistants

- Real-time coach or user assistant for using software

- Translation

- Real-time work on a computer controlled by voice (frontend / mobile dev, CAD, 3D modeling, etc)

Traditionally a lot of these use cases with LLM agents are higher latency because the model needs to wait for the speaker to finish, then decide to call a tool or respond - if they call a tool they need to process the tool result and decide if they want to call a tool or respond, etc...

1 comments

I'm not saying an omni model isn't useful for HCI - essentially my problem is that these demos seem to be highlighting the model's ability to interrupt the user (which is almost always not a good thing), it's ability to keep time (which should be a non-issue really), and it showcases these using fairly lame use-cases.
Ya, the demos were pretty contrived (feels like a running theme amongst the labs...)