Hacker News new | ask | show | jobs
by spyder 481 days ago
Seems similar to that Moshi model from 6 months ago, but this is more refined than that, Moshi is a little crazy, but still it was an impressive demo of how low latency responses, continuous listening and interruptions can improve the voice chat and make it more real or uncanny, (sometimes its "latency" is even too low because is interrupts you before you finish) https://www.youtube.com/watch?v=-XoEQ6oqlbE

They even released some models on huggingface:

https://huggingface.co/collections/kyutai/moshi-v01-release-...

1 comments

Saying this is similar to Moshi is like saying GPT2 is similar to GPT4. You can't have any sort of conversation longer than 30s with moshi before it goes banana. You can talk to this model for an hour and it remains completely coherent.