Hacker News new | ask | show | jobs
by lifeisstillgood 1786 days ago
Can I just clarify?

There is / are organisations that a) scrape legitimate sites for content, b) host that content on their own 100K domains, c) sit behind cloudflare, d) do some seo??? e) when someone finds their site they then inject an ad or similar rubbish f) do this enough that they make money off the ad / competition / porn ?

That seems like a problem that the ”original-source” metatag was supposed to stop?

1 comments

Canonical urls help with noting your own purposeful duplicated content. But that meta tag goes on the duplicated content. So it doesn't help with scrapers, who strip that out.
But I thought that it was useful for google - who could find two caches with same content, one of which was 2018 one of which 2020 and both say "this is canonical". At that point the 2018 version is real and the other rejected.

Then again, you could just do it with publication dates ...

I don't know why, but Google seems unable to figure out (or just doesn't care) "who published it first". I've seen it be confused many times.