| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by operatingthetan 110 days ago
	Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.