| > Do you plan including AWS Polly in the speech generation? I wasn't, but it's not that hard to integrate a new TTS service. You can copy e.g. AzureService and adapt it to work with AWS: https://voiceover.manim.community/en/stable/services.html > How do you approach aligning the words with animations? Is it possible to align it for a specific word? Indeed, the feature is called "bookmarks". You can see a demo here: https://voiceover.manim.community/en/stable/quickstart.html#... > Did you already describe it somewhere? I did not do a detailed write-up yet, but you can search the repo for "word_boundaries" and "TimeInterpolator". Services like Azure return timestamps for the beginning of each word, and for those that don't return, I integrated Whisper to generate them from the audio. Then, it's a matter of mapping the string indices to audio time via some sort of interpolation (I used linear). |