Hacker News new | ask | show | jobs
by throw10920 618 days ago
> This is a valid use as any of that archived data.

No, it's really not, as most of the people who actually spend the time and effort to produce that content did not consent to it being used to train AI.

> copyright & capitalism

That's a really disingenuous way to say "the creators of that data didn't consent to training or commercial use and I want to steal their effort".

2 comments

I was actually going for the dynamic where sharing isn't caring in this space. Because in theory it would be great if there were a few good companies who crawled the internet for you and sold access to it but in practice those companies are pushed to charge an arm and a leg which drives med-large companies to be incentivized to have to get it themselves.
I don't consent to paying rent, but I still have to. If it's legal for one party it should be legal for all parties. The law shouldn't pick favourites. If ChatGPT (owned by Microsoft) can copy my data I can download unlicensed Windows. If I can't, it can't.
Yes, I completely agree that the law shouldn't pick favorites.

To clarify: the creators of the majority of online content haven't consented to their content being used to build AI models for any company or organization. For US-based "creators", that includes both domestic companies like Anthropic, OpenAI, Google, and foreign companies like ByteDance.