| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmalicki 28 days ago
	With RLHF and RLVR we are creating tons of new training data, that is much more focused than reading the Internet. Annotation shops are doing many billions per year in revenue creating newer data, and a lot of it is highly complex, focused on rewarding multi turn agentic trajectories.