| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scoresmoke 1024 days ago
	Thank you! I excluded the coding tasks as most annotators don't possess this expertise. I trust them in comparing pairs of dissimilar model outputs that don't require any specific skill but commonsense reasoning. The only manual analysis was when I checked the passed/failed prompts of the top-performing model.