| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toomuchtodo 4582 days ago
	You just ignore the robots.txt file, crawl slowly, and from distributed virtual machines. Not that you should do that. Robots.txt is a nicety though, the client doesn't have to respect it, and the server doesn't have to allow your HTTP requests.