Hacker News new | ask | show | jobs
by syserror 1583 days ago
Wow! I'm working on something exactly like this as a personal project!

I have a few questions about how you're doing time segmentation.

Are you using a purely text based approach? as in your pipeline looks like STT -> some BERT based model for segmentation/summary?

or are you using a text + audio model to get extra signal?

Have you found any cool tricks (pre-processing/heuristics) that really improved your approach?

How are you dealing with ad placement insertion that can have different times for different users? For example user A might have a 30 second ad inserted and user B might have two ads inserted for a total of 60 seconds. That would shift all your times at a user specific level. Or are you serving the podcasts from your own server?

My project goal is slightly different than yours in that my goal is almost entirely automatically skipping ads until I am able to just pay a subscription fee to the podcasters to support their content.

Awesome app! Super exciting to see folks working on this

1 comments

Cool! That's great. I think there's so much still to do in the space of AI for spoken audio. Be it podcasts, audio books, video calls or similar. So definitely keep at it. I hope you understand that I can't reveal all of our tricks. But what I can say is that it definitely helped us to think about what we as humans use as input for our processing in these situations. So it's not just text. The audio has value as well. To see this you can do a small experiment by just looking at the transcript manually and then trying to find optimal segmentation points. It's difficult. Dynamic ad insertion is also still a challenge for us. We'd like to develop a technical solution for it but haven't found the time to tackle it yet.