It only happens if you bait it really hard and push it into a corner. That's not representative at all. I use Copilot to write highly niched code that's based on my own repo. It's simply amazing at understanding the context and suggest things I was about to write anyway. Nothing it produces is just copypasted character by character. Not even close.
As others have pointed out, it means the model contains copyrighted material. So I guess that’s totally illegal. Like if I ripped a Windows ISO, zipped it up and shared it with half the world. You know what would happen to me don’t you ?
Not the same thing at all. The data isn't just sitting there in a store inside the model that you can query. No-one would be able to look at the raw data and find any copyrighted material, even if all it was trained on was copyrighted code (which I agree is an issue).
Though, GitHub would do well to also bake-in approp attributions if a significant portion of the generated code is a copypasta.