Hacker News new | ask | show | jobs
by babypuncher 1108 days ago
Some insist this is about making it so companies training LLMs aren't doing so with Reddit's data for free.

But I call bullshit. People training new AI models are using the API right now because it's free and easy, but as soon as that changes they will go back to good old fashioned web scraping.

1 comments

People training AI were already using CommonCrawl. There’s too many data sources to figure out each API. Everyone just downloads CC from AWS.