Hacker News new | ask | show | jobs
by shortsunblack 917 days ago
The massive illegal scraping of data on the internet is "only done once" type deal. After platforms have learned of the abuse OpenAI has engaged in, content platforms are now gated and under access controls. You can't access NSFW content on Reddit without logging in, for reference[1]. You could before OpenAI Buzz existed. The point of the illegal scraping is the first mover advantage. Subsequent scrapings will not be as easy. This is also the reason why we could send FBI agents to OpenAI to bust their servers and delete the training data. After wards, gathering this said data again would be much more harder, thus delaying any kind of LLM "progress" in future. For LLM skeptics, this is a dream. Jail the executives, send in feds to light the server rooms on fire. [1] still works on old.reddit.com
2 comments

Reddit gating NSFW content with login is pretty obviously a play to increase signups and therefore engagement. Making scraping less feasible might just be a bonus, but attributing the whole thing to that is a stretch.
> You can't access NSFW content on Reddit without logging in

Sorry, what? You think reddit is trying to prevent openai from scraping the porn subreddits???

There is quite a bit of content that is not porn marked as nsfw on Reddit.