| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by prophesi 1820 days ago
	> I think a few companies use Elixir to power their web crawling/scraping tools. What do they use for headless browser scraping? I tried Hound a few months ago, but it seems too geared towards testing to be used more generically. We ended up just using Oclif and Puppeteer for scraping via NodeJS.

1 comments

vereis 1820 days ago

Might fall into the same category as Hound but Wallaby exists and works.

Otherwise have you heard of Crawley?

link

prophesi 1818 days ago

I heard of Crawley and thought it only did HTTP-based crawling, but now I see it has a browser rendering option via Splash[0] that looks like it'd fit the bill. Thanks!

(Also had issues configuring Wallaby to be used outside of testing)

[0] https://splash.readthedocs.io/en/stable/api.html

link