|
|
|
|
|
by dghlsakjg
529 days ago
|
|
> Publicly publishing information for others to access and then complaining that ~1 rps takes your site down is not sympathetic. I don't know what the actual numbers and rates are because they weren't reported, but the fact that they weren't reported leads me to assume they're just trying to get some publicity. They publicly published the site for their customers to browse, with the side benefit that curious people could also use the site in moderation since it wasn't affecting them in any real way. OpenAI isn't their customer, and their use is affecting them in terms of hosting costs and lost revenue from downtime. The obvious next step is to gate that data behind a login, and now we (the entire world) all have slightly less information at our fingertips because OpenAI did what they do. The point is that OpenAI, or anyone doing massive scraping ops should know better by now. Sure, the small company that doesn't do web design had a single file misconfigured, but that shouldn't be a 4 or 5 figure mistake. OpenAI knows what bandwidth costs. There should be a mechanism that says, hey, we have asked for many gigabytes or terrabytes of data from a single domain scrape, that is a problem. |
|