Hacker News new | ask | show | jobs
by strstr 457 days ago
It's a lot better at my standard benchmark "Magic: The Gathering" rules puzzles. Gets the answers right (both the outcome and rationale).
1 comments

Ooof, it failed my "Wheel of Potential" bug finding question, and got aggressive about asserting it was correct.