|
|
|
|
|
by zukerpie
1127 days ago
|
|
Really nice! I'm also currently playing around a lot with automatically generated videos and I can see this having a lot of potential! Some questions that come to my mind:
1. Do you plan including AWS Polly in the speech generation? There is also a free tier, the API is nice so it might be also a good choice for people using AWS SDK already in their projects.
2. How do you approach aligning the words with animations? Is it possible to align it for a specific word? I was wondering how one might approach this. Did you already describe it somewhere? |
|
I wasn't, but it's not that hard to integrate a new TTS service. You can copy e.g. AzureService and adapt it to work with AWS:
https://voiceover.manim.community/en/stable/services.html
> How do you approach aligning the words with animations? Is it possible to align it for a specific word?
Indeed, the feature is called "bookmarks". You can see a demo here:
https://voiceover.manim.community/en/stable/quickstart.html#...
> Did you already describe it somewhere?
I did not do a detailed write-up yet, but you can search the repo for "word_boundaries" and "TimeInterpolator". Services like Azure return timestamps for the beginning of each word, and for those that don't return, I integrated Whisper to generate them from the audio. Then, it's a matter of mapping the string indices to audio time via some sort of interpolation (I used linear).