Y
Hacker News
new
|
ask
|
show
|
jobs
AGCI: A Benchmark for Testing Long-Chain Reasoning Stability in AI Models
(
dropstone.io
)
1 points
by
daredevil49
214 days ago