Hacker News new | ask | show | jobs
AGCI: A Benchmark for Testing Long-Chain Reasoning Stability in AI Models (dropstone.io)
1 points by daredevil49 214 days ago