Hacker News new | ask | show | jobs
by gchamonlive 23 days ago
I have a different impression, that the folks here are divided in this issue, with a half being AI maximalists saying it's a necessary evil while the other half condemning such practices, maybe not as much as to protect copyright per se, but because there are two different measures here. While teenagers get ridiculous fines for sharing MP3, big corp gets the free pass for stealing data on a industrial scale.
2 comments

If AI was public domain and free for everyone, I would have less issues with it (not saying no issues). But yeah, the only people actually benefiting from this are big tech corps who actively destroy society since over a decade now.

The argument about the ability to self host doesn't really make sense to me given that most of society can not even afford RAM at the moment. So all these big tech frontier models should be public domain.

> because of economic pressures

Self-hosting isn't relevant here anyway. When discussing the hoovering up of information irrespective of licences to produce the model, where the model is finally run isn't significant.

You might not be paying the industry pirates-at-scale to run a model on their hardware, but you are still using the same information, irrespective of the same desires of its creators, the same way, just in a different location.

Heck, local hosting might even be making the situation worse if people are trying to train their own model because they are then likely to be scraping data too, and becoming part of the army of bots that are pushing hosting costs up and forcing everyone to use tricks like PoW scripts that can inconvenience human readers as much as the scrapers.

> You might not be paying the industry pirates-at-scale to run a model on their hardware, but you are still using the same information, irrespective of the same desires of its creators, the same way, just in a different location.

For individual use I personally think it's ok. Access to information shouldn't be penalized or regulated, but distribution should. So in this case it's relevant where a bootleg model is run.

And another half being copyright abolitionists like me who don't care about AI at all but see copyright as essentially a societal fiction that even if it was useful in the past is now no longer, or rather, only useful to big corporations to throw their weight around like Disney who lobbied the government to implement their infamous Mickey Mouse laws with ridiculous copyright term limits.
I agree with you to an extent, but I think that when people profit from a work (e.g. by using it to train a proprietary AI that they charge people to use) they should share the profit with the author of the work.

So I think Anna's Aarchive is fine. OpenAI is not.

That's why I believe in open weight or even open source AI models. If you're gonna train you might as well democratize access to everyone, not the faux "democratization" that OpenAI and Anthropic talk about where only they control access.
I didn't want to go into this topic, but in right here with you, I'm an information access anarchist.