| HN Mirror

A crawl requires "extraction" of data from a web page, which according to Wikipedia is part of the definition of so-called "web scraping". Even if a crawler is using a sitemap.xml file, it still has to "scrape" (retrieve and extract from) that file first. It seems crawling always requires scraping.

If all the pages to be retrieved are known a priori, before retrieval begins, then one would likely call that "scraping". Whereas if not all pages are known before retrieval begins, then one would likely call that "crawling".