| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by startupsfail 205 days ago
	There are still blatant failure modes, when models engage into clear sycophancy, rather than expressing enthusiasm, etc. I'd guess, in practice a benchmark (like this vibesbench), that could help catching unhelpful and blatant sycophancy fails may help.