Hacker News new | ask | show | jobs
by truelinux1 103 days ago
I gave several LLMs a logic puzzle with an embedded contradiction. Some flagged it. Some quietly bent the rules to produce an answer anyway.

Knowing which type of model you're using (a helpful solver or a strict judge) really matters.