| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marcusb 429 days ago
	I've seen a bunch of requests with forged ChatGPT-related user agent headers (at least, I believe many are forged - I don't think OpenAI uses Chinese residential IPs or Tencent cloud for their data crawling activities.) Some of the LLM bots will switch to user agent headers that match real browsers if blocked outright.

2 comments

GodelNumbering 429 days ago

I checked IPs on those, they belonged to MSFT

link

hansvm 429 days ago

Does it suffice to load the content with JS or WASM to keep them out, or are they using some sort of emulated/headless browser?

If they're running JS or WASM, can the JS run a few calls likely to break (e.g., something in the WebGPU API set, since they likely aren't paying for GPUs in their scraping farm)?

link

marcusb 429 days ago

I haven't tested that behavior, sorry.

link

hansvm 429 days ago

No worries. I'll get around to it. I was just curious if you might've explored a bit. Thank you.

link