| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mentalgear 409 days ago
	"Despite using zero human-curated data, AZR achieves state-of-the-art results on diverse coding and math reasoning benchmarks, even outperforming models trained on large in-domain datasets. This demonstrates the potential for sophisticated reasoning skills to emerge purely through self-play without domain-specific supervision."

1 comments

wiz21c 408 days ago

> "sophisticated reasoning skills"

Does it mean that it uses the data it has to the maximum possible level to produce new reasoning (that add to those produced by less algorithms). IOW, are we still in the realm of: with a given data set, A.I. can produce up to N reasoning capabilities and consequently, can't produce more than that ? IOW, reasoning is bound by knowledge ? And therefore, maybe we could just start from a data/knowledge set in which we add some randomness and self play until some form of reasoning emerge ?

MoonGhost 408 days ago

Up to N at a time probably. Then move on using them. The problem is the longer the chain, the more likely it will deviate from the reality. It will include non-obvious atomic decisions and wrong assumptions. This will make the whole thing unstable. I.e. without strict human supervision it likely will start producing crap. Probably some self double checks can help, but still. On the other hand humans aren't that smart either...