Hacker News new | ask | show | jobs
by enord 877 days ago
Listen, most website and book-authors want to be indexed by google. It brings potential audience their way, so most don’t make use of their _right_ to be de-listed. For these models, there is no plausible benefit to the original creators, and so one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.
1 comments

> It brings potential audience their way, so most don’t make use of their _right_ to be de-listed.

The Authors Guild lawsuit against Google Books ended in a 2015 ruling that Google Books is fair use and as such they don't have a right to be de-listed. It's not the case that they have a right to be de-listed but choose not to make use of it.

The same would apply if collation of data for machine learning datasets is found to be fair use.

> one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.

Datasets I'm aware of already have respected machine-readable opt-outs, so if that were to be legally enforced (as it is by the EU's DSM Directive for commercial data mining) I don't think it'd be the end of the world.

There's a lot of power in a default; the set of "everything minus opted-out content" will be significantly bigger than "nothing plus opted-in content" even with the same opinions.

With the caveat that I was exactly wrong about the books de-listing, I feel you are making my point for me and retreating to a more pragmatic position about defaults.

The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow. You would be a fool not to, the AI companies are trying to end run you, using your own content, and make a profit without compensating you and leave you with no recourse.

> I feel you are making my point for me and retreating to a more pragmatic position about defaults

I'm unclear on what stance I've supposedly retreated from. My position is that an opt-out is not necessary under current US law, but that it wouldn't be the worst-case outcome if new regulation were introduced to mandate it.

> The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow

By "default" I refer not to the most common choice, but to the outcome that results from inaction. There's a bias towards this default even if the majority of rightsholders do opt to use Nightshade (which I think is unlikely).