| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stared 117 days ago
	Rerun it for "high" and "xhigh" effort settings, and GPT-5.2-Codex still get 0% false positive, while getting at the level of other best models for localization of backdoors: https://quesma.com/benchmarks/binaryaudit/