| This makes a good benchmark LLMs: ```
look at this paper: https://arxiv.org/pdf/2603.21852 now please produce 2x+y as a composition on EMLs
``` Opus(paid) - claimed that "2" is circular. Once I told it that ChatGPT have already done this, finished successfully. ChatGPT(free) - did it from the first try. Grok - produced estimation of the depth of the formula. Gemini - success Deepseek - Assumed some pre-existing knowledge on what EML is. Unable to fetch the pdf from the link, unable to consume pdf from "Attach file" Kimi - produced long output, stopped and asked to upgrade GLM - looks ok |
TIL you can taunt LLMs. I guess they exhibit more competitive spirit than I thought.