Hacker News new | ask | show | jobs
by mrskitch 1926 days ago
Hey, Joel here, mostly responsible for this tool. Happy to answer questions — one thing not well covered is getting puppeteer to run in the browser, especially a webworker. Can talk more about it if there’s interest!
3 comments

This is magical and cool.

As a web dev generalist, I can usually understand how most things work under the hood.

But playing with chrome.browserless.io breaks that. You're streaming the web page in a <canvas> element, but how can I highlight text? When I load a youtube video page are you literally proxying the video through your infra, through <canvas> pixels to my browser?

Who dictates what IP the headless chrome is assigned to? Do you have a lot of IPs? I noticed on some pages I'd get the CloudFlare captcha which makes sense if browserless has to cycle through a limited set of IPs where other people have been using it to scrape another cloudflare page.

Yup, There’s a lot going on here. Currently the tool uses a fixed IP for the running browser. That’s why you’re seeing that Cloudflare issue.

As far as the hovering goes, the canvas element is “mirroring” interactions back through to the underlying page. When Devtools are active, this triggers chromium to render hover effects in its GUI. This then gets sent back to the canvas element in the debugging page.

It’s a lot of network traffic and Synchronization... but once everything is setup it works fairly seamlessly

can you explain how this is architected? you are running a browser on a server and its streaming the video to the canvas? where does the cloudflare bit come in?

very interesting project!

You’ve got it: the core service listens for inbound Upgrade http requests, starts a browser, then funnels the connection into the browser.

This debugger simply sits on top of all that, and puts the code/execution context in the browser versus the server. Cloud flare is simply detecting that our server IP is a known headless chrome instance, and is serving their bot detection check

Thanks for your answer! do you do anything to lower the latency between the user and the client?
How to set up my own mirror/instance of chrome.browserless.io? Is there a step-by-step guide for it? The git repo just mentions:

> The application is written in TypeScript, and produces a static asset in the static directory once built.

What should I do with said artifact? How to put it to use?

You can actually install our docker image or use the npm module:

https://github.com/browserless/chrome/blob/5627f1ef041ec23f3...

could you share why you chose puppeteer over playwright? thanks!
Just familiarity with puppeteer, eventually this tool will support both. Puppeteer has less “moving parts” for now