I've had no problem using Microsoft's toil for free by downloading free windows ISOs all my life, so if they want to pirate my Github code it's not bad enough to care about. Besides the bad practices the model might internalize as a result that is
Good for you, don't post your code on GitHub then, as they have an express terms of service about being able to use code submitted for business purposes, including AI model training.
I made it my mission to get the lot of them mad. There are plenty of legitimate ai companies out there but YC seems fond of those unethical, which explains the infusion of ip stealing startups on here and their simps.
It will be interesting to see how it plays out. I can imagine Wiley, McGraw Hill, Pearson and other publishers[0] of educational content OpenAI used could sell the rights to their material to be used for training GPT, but the price would be high enough we would be paying $100/month instead of $20.
[0] Heck, they could even unite and found an LLM startup themselves training the models legally and making it available for users at various tiers.