|
|
|
|
|
by mtbcoder
4172 days ago
|
|
Regarding the spam sites, in your RSS feed, you are publishing your full articles. More than likely, the scraper sites are pulling directly from these feeds, publishing quickly and getting Googlebot to see the content before it hits your site (thus receiving attribution). I would suggest: 1) Summaries only in RSS feeds.
2) Throttle the RSS feed back by several hours so that your latest article is not listed immediately.
3) Upon publishing, immediately link to the article via all of your social media outlets.
4) When internally linking within articles, use full URL paths and not relative. (If the spam sites are directly pulling your content and not cleaning up, you may be able to get a link back to your site from the scraped content.) When publishing, timing is everything. Just my $0.02 based on my own experiences dealing with spam sites. On a side note, even though we are in the age of HTML5, I would still suggest sticking with one H1 tag per page, if possible. |
|
Semantic web could fix this a little by making it easier to scrape with the <article> tag, but publishing content is exactly what RSS was meant to do.
I wish Google would (if even possible) find a better way to fix this. In the same way that there's an actual argument against single page apps because "they can't be indexed" or "SEO, man." Discoverability shouldn't be holding back progress (in an ideal world, I know). Rather, indexing should adapt to new technology so that we can make a better web that's still discoverable by users.