Composer 2 performed differently on evals than Moonshot.ai's coding models: Cursor claims theirs is better than Claude Opus 4.6: https://x.com/fynnso/status/2034706304875602030 / https://archive.vn/bVtik. And, per Lee Robinson (Cursor employee), it is very likely Cursor builds its own foundational model for Composer 3.
Kimi works great in their CLI, but their CLI has a number of workarounds for quirks of their models, including detecting when the model gets into a loop, and reverting to a checkpoint but letting the model compose a "message" to its past self (search their CLI for "BackToTheFuture"...) It doesn't work so well in a harness that doesn't take those quirks into account.
Composer is really good, but just like any Chinese model it needs a good plan. It's cheap and fast, in 1 month of pro I used the equivalent of 500$ in API credit for it.
Shaming others when all AI is trained off scraped content and code huh? Many of those sources either breaking ToS or being illegal, such as Anna’s Archive. Bold move. And Chinese models in particular have been accused of distilling off American models.