Scraping the internet isn't a copyright violation. Using it for LLM training is much more transformative than Google and Internet Archive, which are legal.
Except it ignores the entire premise of copyright which is to protect incentives to create original work, which Google does not destroy and which LLMs (very loudly and proudly) try to do.
There are several components of the Fair Use test, "transformation" is just one of them. The most important dimension is the effect on the market, i.e. the effect on incentives.
You probably shouldn't base your legal analysis on pithy internet comments regardless of how succinct or agreeable they are to you.
Your right, scraping is legally protected. It's reproducing verbatim text that's a violation, which is why LLMs still clumsily refuse to produce song lyrics. They are capable of copyright violations and have to be 'aligned' not to get their providers sued.
Verbatim reproduction is neither necessary nor sufficient to create a copyright violation.
"Copyright violation" is what we call the set of things that destroy the incentive for people to create original work by unduly benefitting from someone else's original work.
And just like that, I totally agree with you