Hacker News new | ask | show | jobs
by timetocoffee 1988 days ago
Hi Chony, thanks for asking.

1. What method did you use to get the summary out of all the subtitles?

I measured the similarity between words in each sentence. If words in two sentences are not very semantically similar, they will be divided into two different chapters. As for how I measure their semantic similarity, I used word2vec (it will be more accurate if I use something like BERT but this is just a prototype).

2. How to get the subtitles of the video (Youtube API)?

Subtitles are available on the YouTube video's HTML, you can write a crawler to get them. YouTube API might also be a way.

3. How to get the timestamp of the specific word in the subtitle? I would really like to build something similar! Thanks a lot!

As timestamps are sentence-level only, there is no perfect way to get them for each word. You will need to do the approximation for it. And I didn't do it for my case.

Hope the answers are helpful. Let me know if you have more questions!