You're asking the right questions. The going theory as far as I can see is that training models is fair use (although it may not be fully resolved in the courts), in which case this whole exercise would seem to be pointless. If it were that easy, I have to think the FSF etc. would have been all over this years ago.
And training is currently considered fair use in the US (some court cases pending).
I am not a lawyer, tho.