| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ComputerGuru 781 days ago
	Does a "web" data source only scrape the individual page or linked pages as well? I'm assuming the former. What would be the least painful way to ingest a knowledgebase (say a wiki-like site) from the web?

1 comments

s1lv3rj1nx 781 days ago

It can scrape linked pages too by defining the depth but make sure the depth parameter is not too much else it will consume too much memory and time.

link

ComputerGuru 780 days ago

Playing around with the UI, I cannot see where that depth would be set. Is it not a per-datasource variable?

Is the "scrape linked pages" configured to be "sandboxed" within a url hierarchy (so adding example.com/foo/ would add all linked pages that are also under example.com/foo/) or not (so it would also include linked pages to other domains or subfolders)?

link