| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mentalgear 62 days ago
	This. Plus if you want to even attempt measuring real 'intelligence' you want to run a neuro-symbolic, de-lexicalized benchmark (e.g. DL-ReasonSuite, SoLT, GSM-Symbolic) - which none of the providers releasing new models showcase.