Hacker News new | ask | show | jobs
by nCave 1397 days ago
The elephant in the room with all of these big AI models is whether we continue to allow any entity to scrape individuals' data off the internet and use it in the development of their software without any consideration to the rights of the owner of the data.

Just because there isn't a perfect 1:1 copy of the data in their released software, doesn't mean we should ignore the fact that this data--which they do not have rights to--is critical for the development of their software.

Many people are conflating this issue with the copyright of the software output (i.e., "you can't copyright a style"). But many artists are rightfully angry that their creative products, which they own the rights to, are being used as a critical component in the development of someone else's software and they have no say in it. They also have no recourse to get their data removed from the development chain.

This is not a new problem, of course. But with the image generators it's become more explicit about how the output relates to the input: "Give me a painting of a clock by Salvador Dali".

I think in terms of data usage rights, we need to consider the inputs of these models and not just the outputs. Data laundering through AI is only going to get worse.