| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rad-b 53 days ago
	Seems interesting but testing myself only yields my results? How would I compare the result to a frontier model, that part seems to be missing? Also, the tests seem to be heavily skewed in favor of what LLMs are good at.