Hacker News new | ask | show | jobs
by ComputerGuru 781 days ago
Does a "web" data source only scrape the individual page or linked pages as well? I'm assuming the former. What would be the least painful way to ingest a knowledgebase (say a wiki-like site) from the web?
1 comments

It can scrape linked pages too by defining the depth but make sure the depth parameter is not too much else it will consume too much memory and time.
Playing around with the UI, I cannot see where that depth would be set. Is it not a per-datasource variable?

Is the "scrape linked pages" configured to be "sandboxed" within a url hierarchy (so adding example.com/foo/ would add all linked pages that are also under example.com/foo/) or not (so it would also include linked pages to other domains or subfolders)?