I think you will need to filter out wire services like AP and Reuters, as I'm seeing stories that are mostly republished wire stories on random websites.
Instead of filtering them out, I’d imagine you’d want to establish their equivalency instead? Then they can be made available as equal/similar alternatives to the same article (i.e., from your outlet of choice).