Hacker News new | ask | show | jobs
by slow_numbnut 949 days ago
Instead of sacrificing flexibility by building one monolith model that does Audio to audio in one go, wouldn't it be better to train a model that handles conversing with the user (knows when the user is done talking, when it's hearing itself, etc) and leave the thinking to other, more generic models?
1 comments

You don't lose flexibility with an end to end model. You lose controllability. But there are ways to mitigate that.