Hacker News new | ask | show | jobs
by woffoor 5 days ago
Most scrapped websites have https, so you need to perform a MITM attack. Scrapers will probably notice that.
2 comments

No, you just need to stand up your own website and feed the scraper a URL to it.
I would just generate scads of Markov chain output and make it look like a plausible web page.
That's pretty much what the bots are scraping now, with all the AI slop websites out there.
Fair point well made.
How would https affect it?

If they're making a request to my machine to go and curl a page, how do they even know whether or not it was https?

Not sure about Bright Data but these are usually SOCKS or HTTP CONNECT proxies because that's most flexible. But the customer might be paying by the gigabyte, so you can still feed them nonsense, maybe a 4 gigabyte TLS certificate.