Y
Hacker News
new
|
ask
|
show
|
jobs
GPT-5.2, Grok 4.1, and DeepSeek v3.2 compare as Santa agents
(
veris.ai
)
4 points
by
_josh_meyer_
187 days ago
2 comments
_josh_meyer_
187 days ago
SantaBench, a fun benchmark with a serious methodology. The task: play a cheeky Santa agent who researches users online and roasts them based on their social media.
link
_josh_meyer_
187 days ago
OP here -- I work at Veris and built this. Happy to answer questions about the methodology!
link