Hacker News new | ask | show | jobs
by hosolmaz 1127 days ago
> Do you plan including AWS Polly in the speech generation?

I wasn't, but it's not that hard to integrate a new TTS service. You can copy e.g. AzureService and adapt it to work with AWS:

https://voiceover.manim.community/en/stable/services.html

> How do you approach aligning the words with animations? Is it possible to align it for a specific word?

Indeed, the feature is called "bookmarks". You can see a demo here:

https://voiceover.manim.community/en/stable/quickstart.html#...

> Did you already describe it somewhere?

I did not do a detailed write-up yet, but you can search the repo for "word_boundaries" and "TimeInterpolator". Services like Azure return timestamps for the beginning of each word, and for those that don't return, I integrated Whisper to generate them from the audio. Then, it's a matter of mapping the string indices to audio time via some sort of interpolation (I used linear).

1 comments

Thanks for the answers!