| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by snyhlxde 496 days ago
	It's funny that reasoning models sometime speaking nonsense and perform worse than well-aligned models like claude-3.5-sonnet in multi-turn games like Akinator. I think it's one current weak point of applying longCoT RL vs. instruction-following alignment. Maybe we need to find a way to address both? Would be interesting to see some results