| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AIPedant 443 days ago
	Even if they were subject matter experts, it's mentally exhausting to judge these things, especially if it's just for a RLHF contracting gig and you're not actually using the report for real work. Even honest and motivated testers would be tempted into relying on surface "vibes" + no immediately obvious whoppers. OpenAI's Deep Research seems oddly restricted in the number of sources it uses, eg repeating one survey article over and over. I suspect it is just too draining and demoralizing for RLHFers to check Deep Research's citations (especially without a formal bibliography).