That's a very interesting page, but the language ranking is wildly different for "average percentage" (python bottom) and "success rate" (python second). Sounds like there is some subtly about this.
Success rate is essentially loading/compilation success + ability to adhere to the environments' rules.
For one-shot responses, the majority of failures are environmental/syntax, which naturally favors interpreted languages. For longer agentic coding sessions, models solve the environment issues quickly and it becomes a fair comparison of who comes up with the smarter solution. You can filter for that here: https://gertlabs.com/rankings?mode=agentic_coding
For one-shot responses, the majority of failures are environmental/syntax, which naturally favors interpreted languages. For longer agentic coding sessions, models solve the environment issues quickly and it becomes a fair comparison of who comes up with the smarter solution. You can filter for that here: https://gertlabs.com/rankings?mode=agentic_coding