Hacker News new | ask | show | jobs
by tegansnyder 3035 days ago
There are a lot of folks reevaluating their crawling engines lately now that Chrome headless is maturing. To me there are some important considerations in terms of CPU/memory footprint that go into distributing a large headless crawling architecture.

The stuff we are not seeing open-sourced is the solutions companies are building around trimmed down specialized versions of the headless browsers like Chrome headless, Servo, Webkit. People are running distributed versions of these headless browsers using Apache Mesos, Kubernetes, and Kafka queues.