Hacker News new | ask | show | jobs
by enan 3928 days ago
Thanks for the comments. Our crawler does respect the robots.txt standard and the nofollow tag. Seems like noarchive is what google recommends. Will look more into it.

Although we do put a banner on the index page - we don't have them on each page. Thanks for pointing it out - will fix!

1 comments

Even more important than that for me (possibly for you too) is that you make sure that none of these pages make it into googles index.

The duplication of content (potentially sending the original pages down in search ranks) and the fact that you are polluting the organic search results for the sites you mirror could be a big issue for the owners of the pages.

Good point! There is a robots.txt that prevents the site from getting indexed now: http://hn.getpageback.com/robots.txt