Hacker News new | ask | show | jobs
by chony 1988 days ago
I love the idea so much! Any idea how could you make it work on the videos with subtitles turned off?

Also, may I ask for a simple workflow of this project? Here are some of my questions.

1. What method did you use to get the summary out of all the subtitles?

2. How to get the subtitles of the video (Youtube API)?

3. How to get the timestamp of the specific word in the subtitle?

I would really like to build somthing similar! Thanks a lot!

1 comments

Hi Chony, thanks for asking.

1. What method did you use to get the summary out of all the subtitles?

I measured the similarity between words in each sentence. If words in two sentences are not very semantically similar, they will be divided into two different chapters. As for how I measure their semantic similarity, I used word2vec (it will be more accurate if I use something like BERT but this is just a prototype).

2. How to get the subtitles of the video (Youtube API)?

Subtitles are available on the YouTube video's HTML, you can write a crawler to get them. YouTube API might also be a way.

3. How to get the timestamp of the specific word in the subtitle? I would really like to build something similar! Thanks a lot!

As timestamps are sentence-level only, there is no perfect way to get them for each word. You will need to do the approximation for it. And I didn't do it for my case.

Hope the answers are helpful. Let me know if you have more questions!