|
I have this in my Apache conf for a site I don't want indexed/archived etc. Header set X-Robots-Tag "noindex, nofollow, noarchive, nositelinkssearchbox, nosnippet, notranslate, noimageindex" Of course, only the beeping Internet Archive totally ignored it and scraped my site. And now, despite me trying many times, they won't remove it. It seems to mostly work, I also have Anubis in front of it now to keep the scrapers at bay. (It's a personal diary website, started in 2000 before the term "blog" existed [EDIT: Not true - see below comment]. I know it's public content, I just don't want it searchable public) |
In all honestly, if you're hosting it on the internet, why is this a problem? If you didn't want it to backed up, why is it publicly accessible at all? I'm glad the internet archive will keep hosting this content even when the original is long gone.
Let's say I'd read your website and wanted to look it up one day in the far future, only to find many years later the domain had expired, I'd be damn glad at least one organization had kept it readable.