| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lifeisstillgood 1786 days ago

Can I just clarify?

There is / are organisations that a) scrape legitimate sites for content, b) host that content on their own 100K domains, c) sit behind cloudflare, d) do some seo??? e) when someone finds their site they then inject an ad or similar rubbish f) do this enough that they make money off the ad / competition / porn ?

That seems like a problem that the ”original-source” metatag was supposed to stop?

1 comments

tyingq 1786 days ago

Canonical urls help with noting your own purposeful duplicated content. But that meta tag goes on the duplicated content. So it doesn't help with scrapers, who strip that out.

link

lifeisstillgood 1786 days ago

But I thought that it was useful for google - who could find two caches with same content, one of which was 2018 one of which 2020 and both say "this is canonical". At that point the 2018 version is real and the other rejected.

Then again, you could just do it with publication dates ...

link

tyingq 1786 days ago

I don't know why, but Google seems unable to figure out (or just doesn't care) "who published it first". I've seen it be confused many times.

link