Hacker News new | ask | show | jobs
by remus 1432 days ago
> the identical text copied from stack overflow should be easily identifiable

Google starts matching content from SO => Spammers start tweaking the text slightly => google implements some expensive similarity score to down rank copy cat sites => spammers use more complex scrambling=> ...

> volunteers put together a list of these sites themselves

These lists only work because they're used by a tiny minority of people. If Google were to do this the spammers would start switching domains more quickly (or find some other workaround).

I'm no Google apologist but I think you're underestimating how hard search ranking is when spammers are actively trying to game the system.

2 comments

> tweaking the text slightly

That's what ML is perfect at detecting, which is Google's forte.

Some of these sites have been returned as top results for a while, so are you suggesting that Google just gave up because spammers would be able to evade them with an update?

Yes it is arms race, google has far more resources than spammers do so they should be ahead easily.

You underestimate the resources google has at its disposal.

They simply don’t care because there is no real competition to worry,even with this spam you are still likely to use google, so why would profit motivated company bother ?