Hacker News new | ask | show | jobs
Show HN: Automatic YouTube Summarization (Chapter Generation) (segmentel.com)
5 points by timetocoffee 1988 days ago
3 comments

There are way more Youtube videos that we could watch. I thought it would be nice if AI can help us to summarize them to save our time. Thus, I spent my Christmas making this tool.

All you have to do is pasting your video URL (with CC turned on) to the website, and it will generate the chapters (kind of summary) for you. You can use them to find the parts you're interested in. It might not as accurate as human-generated chapters. But it can give you a quick tour of what the video is about.

It took me longer than I thought to develop this web app. I spent most of the time dealing with its deployment (so many hurdles!).

Let me know if you think it's helpful or not, and how I can improve it. Thank you very much!

I love the idea so much! Any idea how could you make it work on the videos with subtitles turned off?

Also, may I ask for a simple workflow of this project? Here are some of my questions.

1. What method did you use to get the summary out of all the subtitles?

2. How to get the subtitles of the video (Youtube API)?

3. How to get the timestamp of the specific word in the subtitle?

I would really like to build somthing similar! Thanks a lot!

Hi Chony, thanks for asking.

1. What method did you use to get the summary out of all the subtitles?

I measured the similarity between words in each sentence. If words in two sentences are not very semantically similar, they will be divided into two different chapters. As for how I measure their semantic similarity, I used word2vec (it will be more accurate if I use something like BERT but this is just a prototype).

2. How to get the subtitles of the video (Youtube API)?

Subtitles are available on the YouTube video's HTML, you can write a crawler to get them. YouTube API might also be a way.

3. How to get the timestamp of the specific word in the subtitle? I would really like to build something similar! Thanks a lot!

As timestamps are sentence-level only, there is no perfect way to get them for each word. You will need to do the approximation for it. And I didn't do it for my case.

Hope the answers are helpful. Let me know if you have more questions!

Interesting!