Hacker News new | ask | show | jobs
by afinemonkey 2943 days ago
Very interesting to see this. I've been running a similar infrastructure rendering sometimes up to 500k pages a day, although often without images. I'm also running on Digital Ocean, but using nightmare.js (https://github.com/segmentio/nightmare), which runs on top of Electron, which in turns runs on top of Chromium.

The CPU and RAM patterns I see are different, with fixed CPU usage at near max and memory oscillating between 65% and 80%. I believe this is due to the different usage pattern, I basically always have at least 20/30 jobs running concurrently on each machine, and they're usually fairly long (up to 10 minutes or so).

Contrary to what you mention, I've never had an issue with pages crashing and bringing the whole browser down. Maybe it has happened, but it's definitely negligible compared to the benefits I get by running say 5 pages in parallel. For some tasks I've also had some luck overriding the cross-content policy and using dirty ol' iframes to render multiple webpages in the same session.

I've considered migrating to puppeteer, so it's encouraging to see large scale project sharing their experience with it.

1 comments

Same experience here - we run a docker instance of Chrome using tabs for multiple pages rather than multiple browsers, and they regularly run for days without issues. Of course their RAM usage gradually expands but it is easy enough to systematically stop & start the container, thanks to some built in error and fault handling to retry any requests which failed.

I can’t say I’ve seen one tab bring down the entire browser, but I’m sure thats feasible, but thanks to docker and the fault handling above, it’d restart the instance and be up within seconds.

This is what I've initially noticed as well (app will run for several days and eventually the account will cancel due to unavailability ... you'll get paged).

The screenshot I showed in that post is an instance that's been running for _months_ under high load. I can't stress enough that using tabs/pages will always result in frequent restarts which can be really tricky if there's other sessions that need to gracefully finish.