Hacker News new | ask | show | jobs
by joshfriend 2722 days ago
That's a great idea if the injected ads are exactly the same length and always get put in the same place. If you have a 30 second ad in one download and 40s in another, 10 seconds of legitimate podcast audio gets interpreted as an advertisement.
2 comments

It works even if they don't, you just need a more intense algorithm to do similarity matching across all offsets.
With maximally repeated sequences, you can https://ieeexplore.ieee.org/abstract/document/6012115
You "just" need to find all the common segments between different downloads; those are very likely to be the actual content, and not differing ads. Naturally this doesn't work very well if the pool of ads is low and you get repeats.