You cant copyright facts. There is a legal argument to be made that an LLM is a reduction of copyright material into its underlying data (its vectors).
The only people who are going to win in this fight are the lawyers.
Clearly there is a limit. Otherwise, you could circumvent all copyright by saying "The contents of Harry Potter and the Prisoner of Azkaban is <insert novel text here>". While technically a fact, it's protected by copyright.