| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TeMPOraL 502 days ago

Enforcing copyright on training data to this extent would actually create a temporary moat for the biggest players - they can afford to hire a lot of cheap labor to supplement the training dataset with human-authored original works that skirt IP protections by interpreting, parodying, commenting on or otherwise describing the protected works without actually infringing on them. As long as they keep those datasets private, everyone else is shit out of luck.

(I'm reiterating my prediction wrt. AI and moats - the only mid-term moat there can be is in human labor. Hardware vendors benefit from selling better hardware to more people for less; software and research are cheap to scale, datasets eventually leak or get reproduced. Human labor is the one thing that doesn't scale, and except for an economic crisis, only ever gets more expensive with time. Whatever edge one can get by applying human labor that cannot be substituted by AI - like RLHF and its evolutions - is the one that will last all the way to AGI; past that, moats won't matter anymore.)

One of the many reasons I'm firmly on the side of making the training of large neural models exempt of copyright considerations for everyone.

1 comments

fulafel 501 days ago

Isn't the training already exempt from copyright? Copyright is in the core about enabling licenses related to who's allowed to distribute copies of content (not ideas, but the exact same text, etc).

edit: apparently in the EU the situation is complicated by new AI specific legislation in the works: https://www.morganlewis.com/pubs/2024/02/eu-ai-act-how-far-w...

link