Hacker News new | ask | show | jobs
by joelkoen 529 days ago
> “OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site.

The IP addresses in the screenshot are all owned by Cloudflare, meaning that their server logs are only recording the IPs of Cloudflare's reverse proxy, not the real client IPs.

Also, the logs don't show any timestamps and there doesn't seem to be any mention of the request rate in the whole article.

I'm not trying to defend OpenAI but as someone who scrapes data I think it's unfair to throw around terms "like DDOS attack" without providing basic request rate metrics. This seems to be purely based on the use of multiple IPs, which was actually caused by their own server configuration and has nothing to do with OpenAI.

1 comments

Why should web store operators have to be so sophisticated to use the exact right technical language in order to have a legitimate grievance?

How about this: these folks put up a website in order to serve customers, not for OpenAI to scoop up all their data for their own benefit. In my opinion data should only be made available to "AI" companies on an opt-in basis, but given today's reality OpenAI should at least be polite about how they harvest data.