| HN Mirror

Success rate is essentially loading/compilation success + ability to adhere to the environments' rules.

For one-shot responses, the majority of failures are environmental/syntax, which naturally favors interpreted languages. For longer agentic coding sessions, models solve the environment issues quickly and it becomes a fair comparison of who comes up with the smarter solution. You can filter for that here: https://gertlabs.com/rankings?mode=agentic_coding