Hacker News new | ask | show | jobs
by KTibow 330 days ago
It sucks more that Cloudflare/similar have responded to this with "if your handshake fingerprints more like curl than like Chrome/Firefox, no access for you".
3 comments

I now write all of my bots in javascript and run them from the Chrome console with CORS turned off. It seems to defeat even Google's anti-bot stuff. Of course, I need to restart Chrome every few hours because of memory leaks, but it wasn't a fun 3 days the last time I got banned from their ecosystem with my kids asking why they couldn't watch Youtube.
Where can I learn more about custom bots in JS and Chrome?
Or getting a CAPTCHA from Chrome when visiting a site you've been to dozens of times (Stack Overflow). Now I just skip that content, probably in my LLM already anyway.
Keep in mind that those LLMs are one of the bigger reasons why we see more and more anti bot behaviour on sites like SO.

That aggressive crawling to train those on everything is insane.

It's the same thing as the anti pirate ads, you only annoy legit customers, this agressive captcha campaign just makes Stackoverflow drop down even faster than it would normally by making it lower quality.
There are tools like curl-impersonate: https://github.com/lwthiker/curl-impersonate out there that allow you to pretend to be any browser you like. Might take a bit of trial and error, but this mechanism could be bypassed with some persistence in identifying what is it that the resource is trying to block.