Hacker News new | ask | show | jobs
by anon4 3839 days ago
robots.txt already lets you specify per-robot behaviour. You can trivially opt-out of crawling, but opt-in to archiving by explicitly allowing archive.org's bot and disallowing all other user agents.