Y
Hacker News
new
|
ask
|
show
|
jobs
by
pyryt
871 days ago
Knowing when to speak is actually a prediction task in itself. See eg
https://arxiv.org/abs/2010.10874
Would be indeed great to get something like this integrated with whisper, LLM and TTS
2 comments
zachthewf
871 days ago
Hard for me to imagine that this could be solved in text space. I think the prediction task needs to be done on the audio.
link
stiffler01
871 days ago
We thought about doing this in Whisper itself, since its already working in the audio space.
link
stiffler01
871 days ago
Yes, this is something we want to look into in more detail, really appreciate sharing the research.
link