Hacker News new | ask | show | jobs
by prophesi 1820 days ago
> I think a few companies use Elixir to power their web crawling/scraping tools.

What do they use for headless browser scraping? I tried Hound a few months ago, but it seems too geared towards testing to be used more generically. We ended up just using Oclif and Puppeteer for scraping via NodeJS.

1 comments

Might fall into the same category as Hound but Wallaby exists and works.

Otherwise have you heard of Crawley?

I heard of Crawley and thought it only did HTTP-based crawling, but now I see it has a browser rendering option via Splash[0] that looks like it'd fit the bill. Thanks!

(Also had issues configuring Wallaby to be used outside of testing)

[0] https://splash.readthedocs.io/en/stable/api.html