| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by swyx 7 days ago

to you it may do idk. note that if you scroll past fig 1 you get into a nice data explorer that breaks out pass@5 by reasoning level with token and $ and step cost visualized. i think some other commenters on this hn thread got very worked up about stuff we actually agree on.

internally ive charted everything and am satisfied that theres no meaningful rank bias introduced. weve sliced it every which way. in fact we have not even published the best looking charts for this story to be told, because we have further publishing plans on frontiercode

tldr “trust me bro” this isnt the issue and if anything we couldve done more to increase N as tedsanders below points out