Hacker News new | ask | show | jobs
by mmaaz 569 days ago
I tried it with a certain conceptual problem in computer algebra (which I’ve had dismal results on GPT o1-preview and o1-mini… sort of a private benchmark) and it spent 2 minutes arguing with itself about what a Python function was called.