Thanks, I'll give it a try. Plandex's model settings are version-controlled like everything else and play well with branches, so it will be fun to start comparing how all different kinds of models do vs. each other on longer coding tasks using a branch for each one.
For challenging tasks, I typically get code outputs from all three top models (gpt4, opus, and ultra), and pick the best one. It would be nice if your tool could simply this for me: run all three models and perhaps even facilitate some type of model interaction to produce a better outcome.