|
|
|
|
|
by TeMPOraL
502 days ago
|
|
Enforcing copyright on training data to this extent would actually create a temporary moat for the biggest players - they can afford to hire a lot of cheap labor to supplement the training dataset with human-authored original works that skirt IP protections by interpreting, parodying, commenting on or otherwise describing the protected works without actually infringing on them. As long as they keep those datasets private, everyone else is shit out of luck. (I'm reiterating my prediction wrt. AI and moats - the only mid-term moat there can be is in human labor. Hardware vendors benefit from selling better hardware to more people for less; software and research are cheap to scale, datasets eventually leak or get reproduced. Human labor is the one thing that doesn't scale, and except for an economic crisis, only ever gets more expensive with time. Whatever edge one can get by applying human labor that cannot be substituted by AI - like RLHF and its evolutions - is the one that will last all the way to AGI; past that, moats won't matter anymore.) One of the many reasons I'm firmly on the side of making the training of large neural models exempt of copyright considerations for everyone. |
|
edit: apparently in the EU the situation is complicated by new AI specific legislation in the works: https://www.morganlewis.com/pubs/2024/02/eu-ai-act-how-far-w...