Hacker News new | ask | show | jobs
by londons_explore 2658 days ago
Creating your own search engine in today's world is pretty much impossible.

For one thing, loads of sites load all their content via Ajax, so at a minimum you're gonna need a browser engine as the base of your crawler...

1 comments

Headless Chrome is available and widely-used, and commonly you can get around the JS thing by simply waiting a few seconds before scraping. I'd assume the crawling itself isn't the hard part (aside from maybe just the raw compute time it takes).