| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jadell 2137 days ago

One day I will right an extensive post (or set of them) about using Puppeteer to bypass sites' anti-bot measures. It's a fascinating (and annoying) cat-and-mouse game. But at the end of the day, almost all bot detection measures rely on using Javascript to report back metrics about the browser, but those measures are running in an environment where the bot completely controls what Javascript reports back.

One of my favorite tricks I've seen employed are detection measures that look to see if common detection bypass tricks have been implemented (like checking the toString output of commonly overridden native functions.)

https://theheadless.dev/posts/challenging-flows/#bot-detecti...

7 comments

from 2137 days ago

I recently was working on the same thing (https://github.com/chris124567/puppeteer-bypassing-bot-detec...). The existing solutions (like the headless-cat-n-mouse repo) seemed to be pretty incomplete and easily detected. I got mine to pass all the checks on Antoine Vastel’s site along with Distil Networks’ and PerimeterX‘s bot detection (although in practice they may have other ways of detection like checking for rapid URL visits).

Something worth noting about toString is that it can now be undetectably modified (to fake “native code”) with the new ES6 Proxy object. There was a really interesting blog post written about this at https://adtechmadness.wordpress.com/2019/03/23/javascript-ta... (I also incorporated this into my project).

link

jadell 2137 days ago

Using Proxy is key to a lot of bot detection avoidance.

edit: Really like that repo! I use a lot of those techniques as well.

link

judge2020 2137 days ago

CF seems to have started classifying browsers with no existing CF cookies as likely bots (a score of 10 or less, where 99 is a human and 1 is confirmed bot) for enterprise users of their Bot Management feature[0]. From my testing, it happens for both puppeteer and incognito tabs of Chrome, even with perfect IP reputation.

0: https://support.cloudflare.com/hc/en-us/articles/36002751945...

link

lstamour 2137 days ago

That would explain why I always see CF bot prompts when visiting a site for the first time or the hundredth time in Safari with a few layers of tracking protection and no third-party cookies. I prefer to answer captchas if that’s the price I pay for a bit more privacy, then...

link

kabacha 2136 days ago

CF and Google Captcha is really making the web unbrowsable with hardened browsers. The web is looking really grim for people who care about privacy these days.

link

superasn 2137 days ago

This may be a very noob tool for this game but it has served me well and even though I'm guessing most people know about it, just sharing it for reference:

https://www.npmjs.com/package/puppeteer-extra-plugin-stealth

link

hoten 2137 days ago

Have you seen this? https://github.com/paulirish/headless-cat-n-mouse

link

rashkov 2137 days ago

I wonder if google captcha will always be able to defeat puppeteer? Seems odd for google to publish a set of abuse-able APIs, and not be able to detect their use.

link

peterangular 2137 days ago

There are farms of people who literally sit around all day and solve CAPTCHAs - there's no surefire way to address this problem and it usually ends up in an orchestration of reputation-score tooling (including making a user fill out a CAPTCHA) to fingerprint a bot.

If you're good at spoofing all of that fingerprinting you'll blow straight past them - it's all client-side in-which you have control all the way down to the bits and bytes.

link

vdfs 2137 days ago

You can just use Google text to speech to solve reCaptcha

link

judge2020 2137 days ago

This is a their answer: https://support.google.com/websearch/answer/86640?hl=en

link

jadell 2137 days ago

There are services that will solve captcha for you (including Google's) in "real-time", and with convenient APIs that allow for automation.

link

ragog 2137 days ago

Please do! This is a very interesting topic. Looking forward to reading about it.

link

chrisweekly 2137 days ago

I know it's passe to say "hey there's an xkcd for that", but this is one of my all-time favorites, and it's directly relevant, so... enjoy! :)

https://xkcd.com/810/

link