| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by justcallmejm 355 days ago
	Aloe's neurosymbolic system just beat OpenAI's deep research score on the GAIA benchmark by 20 points. While Gary is full of bluster, he does know a few things about the limitations of LLMs. :) (aloe.inc)

1 comments

nojvek 354 days ago

Yeah there was on old paper that blew math/physics benchmarks out of the water by letting the LLM write code and having the physics engine execute it. I don't have a link to it off my head but that seems to be the right directly.

LLM + general tool use seems to be quite effective.

link