|
|
|
|
|
by seanmcdirmid
908 days ago
|
|
If you watch a bunch of movies then go on to make your own movie based on influence from these movies, you are protected even if you have mentally compressed them into your own movie. At some point, you can learn, be influenced and be inspired from copyrighted material (not copyright infringement), and at some point you are just making a poor copy of the material (definitely copyright infringement). LLMs are probably still at the latter case than the former, but eventually AI will reach the former case. |
|
Even if LLMs can't cite their influences with current technology, that can't be a free pass to continue things this way. Of course all data brokers resist efforts along the lines of data-lineage for themselves and they want to require it from others. Besides copyright, it's common for datasets to have all kinds of other legal encumbrances like "after paying for this dataset, you can do anything you want with it, excepting JOINs with this other dataset". Lineage is expensive and difficult but not impossible. Statements like "we're not doing data-lineage and wish we didn't have to" are always more about business operations and desired profit margins than technical feasibility.