Hacker News new | ask | show | jobs
by leobg 566 days ago
Google has one moat that is often being overlooked: Googlebot. They get to scrape content that is invisible to pretty much every other crawler, thanks to Cloudflare and paywalls.
2 comments

And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.
You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.
Most of the content they crawl is SEO spam, I'm not sure if it's that helpful for model training
SEO spam is the façade you get to see as a user. The gold is all that you don’t see. Just because they don’t show it on page one doesn’t mean it won’t be useful for training.