Hacker News new | ask | show | jobs
by zukerpie 1127 days ago
Really nice! I'm also currently playing around a lot with automatically generated videos and I can see this having a lot of potential! Some questions that come to my mind: 1. Do you plan including AWS Polly in the speech generation? There is also a free tier, the API is nice so it might be also a good choice for people using AWS SDK already in their projects. 2. How do you approach aligning the words with animations? Is it possible to align it for a specific word? I was wondering how one might approach this. Did you already describe it somewhere?
1 comments

> Do you plan including AWS Polly in the speech generation?

I wasn't, but it's not that hard to integrate a new TTS service. You can copy e.g. AzureService and adapt it to work with AWS:

https://voiceover.manim.community/en/stable/services.html

> How do you approach aligning the words with animations? Is it possible to align it for a specific word?

Indeed, the feature is called "bookmarks". You can see a demo here:

https://voiceover.manim.community/en/stable/quickstart.html#...

> Did you already describe it somewhere?

I did not do a detailed write-up yet, but you can search the repo for "word_boundaries" and "TimeInterpolator". Services like Azure return timestamps for the beginning of each word, and for those that don't return, I integrated Whisper to generate them from the audio. Then, it's a matter of mapping the string indices to audio time via some sort of interpolation (I used linear).

Thanks for the answers!