| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 1vuio0pswjnm7 1614 days ago
	A crawl requires "extraction" of data from a web page, which according to Wikipedia is part of the definition of so-called "web scraping". Even if a crawler is using a sitemap.xml file, it still has to "scrape" (retrieve and extract from) that file first. It seems crawling always requires scraping. If all the pages to be retrieved are known a priori, before retrieval begins, then one would likely call that "scraping". Whereas if not all pages are known before retrieval begins, then one would likely call that "crawling".