Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.
And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.
SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.