Hacker News new | ask | show | jobs
by nonstopdev 1081 days ago
This is going to be the next big legal battle. Google said in their new policy that anything you serve on the web can go into their AI data set so will be curious when this does happen, who actually starts it as the legal powerhouse. Will we see frivolous RIAA style claims or a larger battle of the massive companies.
1 comments

I agree, and don't think I have much else to contribute, but I think that I want to make my small opinion known:

Using everything as training data is entirely reasonable. Search engines are basically built on this. Regurgitation in very digested, very opinionated, direct citation form is totally reasonable.

Regurgitation as a "new" asset is worth discussing, heavily. I lived through the days of OC Donut Steel in the history of the web. This feels a lot like that. And that's okay! You don't need original thought to create something personally useful. Sargnarg the hargeharg is valuable to the person who came up with them. What's not okay is the delusion that it's a meaningful contribution.

AI generated blades of grass, rock textures, tree bark - this will all allow for detailed, precise, realistic worlds. The authors of the original stock, the people who collected the dataset, deserve compensation. Under whatever license they used. A CC license allows you to use these assets for your own benefit. If a game ships with a set of CC licensed grass textures, a proprietary deterministic algorithm to remix them, and runs on every consumers machine to generate a set of new unique textures, I think that's clearly within license. If it ships with those textures pregenerated, that's clearly a derivative work. If it ships with the midpoint, the mixture of all of those textures, incredibly lossily? That's worth litigation, and lawmaking.

    > Using everything as training data is entirely reasonable. Search engines are basically built on this. Regurgitation in very digested, very opinionated, direct citation form is totally reasonable.
Is it though? If I write about something and Google’s AI creates a regurgitation that is actually a misrepresentation of what I originally wrote and attaches my name on it as the source I think I might be pissed.

Search engines are built on data but they also should return 1 to 1 copies, not some modified version of them.

You and Benjamin Lee of The Guardian[1].

It's not illegal to cite someone out of context to distort their meaning - It's even hard to prove fraud from it. Caveat emptor and all that. Attributing a version made up whole cloth is different, but that's not what Google Search does, that's what Google Bard does.

[1] https://www.theguardian.com/film/filmblog/2015/sep/09/legend...

I'm not saying it's illegal nor that it should be. I was just pushing against the idea that it's a totally reasonable thing to do.

> but that's not what Google Search does, that's what Google Bard does.

Agree which is why I said "Google’s AI". The problem I see is that AI creates this weird middle man that now also acts as a translation layer.

It used to be that you only had a search engine between you and the actual content and a search engine might only show you an excerpt of the original content (which can also be problematic) but at least it was the original content.

Now you have all these AI tools that try to make summaries of the original content and that IMO is very problematic.