They used to be superb on detecting duplicated content. They also were extremely good at detecting spam/ham. Nowadays it feels like they don't even care anymore and whatever filters they have are either broken or untrained.
Copycat sites also used to be extremely careful at not appearing copycat sites. Or not duplicating content on the same site. I am surely not alone in recalling the old mantra of not duplicating content.
Copycat sites don't seem to care anymore.
I don't believe there are an overwhelming number for Google et al to deal with as it's often the same names topping search results that such filters can remove through semi-manual user action.
While leads to the conclusion - Google don't care about duplicate content any more.
Were they? I remember having to manually block myself a lot of those copycat wikipedia/stackoverflow sites back in 2011 or 2012 when they had the domain-blocklist option available for users. When the feature was removed, it all came back.
Maybe the problem is just that there are more of those now.
Google removed that option without even trying to spin it as a pro-consumer change. The only problems I can think it brought to Google are clueless users complaining that they can no longer see microsoft.com in their results, and having a negative impact on unethical advertisers.
I've noticed a recent trend where the copy cat/adware sites are "up-ranked" relative to original content. This would be the expected behavior of a search engine optimizing for clicks and revenue.
Part of the problem might have been that Stack Overflow has been busy shooting themselves in both feet for years.
For a while (maybe around 2012 - 2017) or something it felt like it was almost the rule that if you found a really useful question on Stack Overflow it would always be marked as llw quality.
Eventually I guess they were pruned and that might explain a bit of why they rose.
They annoyed mee too though as they often mixed together unrelated questions on the same page and get hits for very specific queries that are unrelated.
They should give the YouTube audio fingerprint team a shot at it.
But seriously, Google doesn't need to make anything besides bringing back the option to hide certain domains from the results forever. Even if they don't analyze what domains people are hiding, it would dramatically improve the usability.