Hacker News new | ask | show | jobs
by stormfather 565 days ago
And they have the absolutely massive advantage of being able to associate content with queries that led to it, and know which piece of content was selected by the user. That surely can be used in some way to give them a leg up with both choosing good training data, and making for o1 type agentic models.
1 comments

You’re right. They can actually do RLHF just using their users. Showing each of them slightly different generations and watching their behavior.