| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by armcat 498 days ago
	I am not too sure about shortening the CoT tokens explicitly because different problems will require different length of proof - some require half a page, whilst others will require 10 pages worth of tokens. As the graphs in the paper indicate, there is a huge penalty on short reasoning lengths, below a few thousand tokens. For diversity reward, my thinking is basically looking at reasoning tokens in latent space - taking semantic similarity between subsequent chains, and if they are extremely similar, penalizing it.