Can you launder AI model by feeding it to some other model or training process? After all that is how it was originally created. So it cannot be any less legal...
There are a family of techniques, often called something like “distillation”. There are also various synthetic training data strategies, it’s a very active area of research.
As for the copyright treatment? As far as I know it’s a bit up in the air at the moment. I suspect that the major frontier vendors would mostly contend that training data is fair use but weights are copyrighted. But that’s because they’re bad people.
That sentiment is ethically sound and logically robust and directionally consistent with any uniform application of the law as written.
But there is a group of people, growing daily in influence, who utterly reject such principles as either worthy or useful. This group of people is defined by the ego necessary to conclude that when the stakes are this high, the decisions should be made by them, that the ends justify the means on arbitrary antisocial behavior (c.f. the behavior of their scrapers) as long as this quasi-religious orgasm of singularity is steered by the firm hand that is willing and able to see it through.
That doesn’t distress me: L Ron Hubbard has that.
It distresses me that HN as a community refuses to stand up to these people.
As for the copyright treatment? As far as I know it’s a bit up in the air at the moment. I suspect that the major frontier vendors would mostly contend that training data is fair use but weights are copyrighted. But that’s because they’re bad people.