Hacker News new | ask | show | jobs
by embedding-shape 227 days ago
Don't they have models that do text-to-speech and maybe even audio/speech-to-text? If so, there is surely text in the datasets, otherwise I'm not sure how they'd accomplish something like that.