Hacker News new | ask | show | jobs
by corobo 3835 days ago
We currently don't have the technology to automatically transcribe podcasts into text (good for the SEO on your podcast site) - But even if we did, how would it know that something is an advert as opposed to part of the show? Most of the podcasts I listen to have ads and in many cases the ads aren't just at the start or end, they're either a live read during the podcast or a cutaway somewhere towards the middle.

Maybe you're just not the target market for ad-funded podcasts.

1 comments

Cut any segment of 10 seconds or more that is found in more than one podcast episode. (I think this is feasible considering Shazam)

It would remove pre-recorded/repeated ads in the middle (assuming that's what cutaway means), but wouldn't be able to remove live reads. It would also remove intros/exits.

That actually does sound feasible. I'm not sure how feasible when it comes to mobile devices but definitely possible if things like YouTube's ContentID can work
As I understand it, ContentID is notoriously over-aggressive in matching and only "works" in the sense that YouTube's interest in having it isn't particularly harmed by that, since its mostly a tool to improve relations and avoid lawsuits from big media interests.

Without manual validation, it probably wouldn't be a good model for identifying and removing ads from podcasts, especially using a "repeated in multiple podcasts" model, which doesn't start with known ads.