Hacker News new | ask | show | jobs
Subformer: Multilingual video dubbing with speaker diarization and voice cloning (subformer.com)
3 points by mashreghi 152 days ago
1 comments

Hi HN,

We built Subformer (https://subformer.com), a web app that dubs videos into other languages while keeping speaker identity intact.

Most “AI dubbing” pipelines are just ASR → translation → TTS, which breaks as soon as you have multiple speakers. We instead run:

- VAD + speaker diarization - Audio Demixing - Global speaker clustering - Per-segment ASR + translation - Per-speaker TTS (voice cloning or synthetic) - Timeline-aligned remuxing back into the video

The tricky parts were diarization drift on long videos, timing mismatches after translation, and keeping costs sane when doing multilingual TTS at scale.

It’s still early, but it already works well for things like interviews, TV clips, and YouTube videos with multiple speakers.

Would love feedback from people who work on audio, speech, or localization.

https://subformer.com

Congratulations on your work! How better is it compared to established video translators like https://videodubber.ai , Elevenlabs, https://rask.ai or Heygen? These options also offer Speaker diarization and timeline aligned remuxing right?