| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ankit219 552 days ago
	At this point, it's a function of how many thinking tokens can a model generate. (when it comes to o1 and r1). o3 is likely going to be superior because they used the training data generated from o1 (amongst other things). o1-pro has a longer "thinking" token length, so it comes out as better. Same goes with o1 and API where you can control the thinking length. I have not seen the implementation for r1 api as such, but if they provide that option, the output could be even better.