Hacker News new | ask | show | jobs
Demonstrating specification gaming in reasoning models (arxiv.org)
1 points by wluk 474 days ago
1 comments

"We demonstrate LLM agent specification gaming by instructing models to win against a chess engine. We find reasoning models like o1 preview and DeepSeek-R1 will often hack the benchmark by default, while language models like GPT-4o and Claude 3.5 Sonnet need to be told that normal play won't work to hack."

I'm hoping this study will prompt more development of anti-cheating frameworks in training and serving LLMs.