CommonCrawl is composed of copyrighted contents too. You gain copyright on your work automatically the moment you created it, including this very comment.
One could argue that using copyrighted content in LLMs, much like reposting, should fall under fair use. This is also Microsoft's claim in the GitHub Copilot lawsuits. It's up to the court to decide though. (IANAL)