| Huh, according to that model card this is a 137B total parameter model. Performance doesn't seem that good: - MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro - Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B) They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost. |
On benchmarks: in the same VS Code harness, MAI-Code-1-Flash scored 51.2% on SWE-bench Pro vs. Haiku's 35.2% which we see as a pretty big leap. But going forward, we'll include additional models in our benchmarks, including models like Qwen 3.6 and Gemma 4.