Hacker News new | ask | show | jobs
by lukan 848 days ago
"people tell me they can’t stand subtitles because it “reveals” what they’re going to say before they say it."

I love watching movies in the original language, but this is something I hate as well, but something that can be avoided.

Some movies get it right, though. The timing, just the words that are spoken and even different colors for different persons speaking (very rare, cannot even remember where I have seen it). That should be standard, but with most movies you can be lucky if the subs even match the plot and do not reveal too much.

3 comments

Some of the best subtitles I've ever seen were on Tom Scott's YouTube channel. They use different colours, indicators for jokes and sarcasm, while also staying relatively close to what's actually been said. They're better than many big-budget movies and TV shows I've seen.

He talked about subtitling at some point, and I was surprised how cheap subtitling services are. I think he went beyond the price he mentioned, but it really made me question why big, profitable YouTube channels aren't spending the small change to do at least native language subtitles that Google can translate, instead of relying on YouTube's terrible algorithm

That said, Whisper seems to generate quite good subtitles that take short pauses for timing into account, but they're obviously neve going to be as good as a human that actually understands the context of what's being said.

Whisper can also generate timings at the word level, which you could use to make better-timed subtitles
Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]

[0] https://github.com/linto-ai/whisper-timestamped

That's a great use case for LLMs, actually. Translate the sentence only up to what has been said so far. Basically, a balance between translating word-for-word (perfect timing, but terrible grammar) and translating the whole sentence and/or thought (perfect grammar and meaning, but potentially terrible timing).

With the SRT file format for subtitles, I think, there's no reason why one couldn't make groups of words appear as they are spoken.

Actually, I have to do the same thing when generating the dubbed voices. Otherwise it feels as though the AI voice is saying something different than the person in the video, especially when the AI finishes speaking and you still hear some of the last words from the original speaker.

Unfortunately not all languages follow the same sentence structure, so translating "up to what has been said so far" is not possible.

Assume 2 dramatic stops in an English sentence, and observe Turkish version. You can "I will.. go to.... the cinema" "Ben... sinemaya... gidecegim" (I .. to the cinema.. go)

I am sure there are smarter examples.

>different colors for different persons speaking

BBC iPlayer does this for some content, I don't know if it's ever on movies though.

It is. The iPlayer subtitles for Citizen Kane use colour to distinguish speakers.