| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by causal 196 days ago
	That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.

2 comments

woeirua 196 days ago

They're clearly building better training datasets and doing extensive RL on these benchmarks over time. The out of distribution performance is still awful.

link

taurath 196 days ago

I don’t think their words mean just about anything, only the behavior of the models.

Still waiting of Full Self Driving myself.

link