Hacker News new | ask | show | jobs
by graderjs 1217 days ago
Can you not separate into two phases? Speech separation to get source per speaker, and then whisper on each in isolation (maybe interlacing prompts)?