The list of tools that Pythonheads present as a definite solution to their problems changes every year, yet the results are still far behind Rust/Scala/Kotlin/C#.
I've tested many flows involving linters. Results are far from ideal - agents tend to work around linters, mass-add ignore annotations, etc, especially in situations when fixing one warning/error triggers another (and that happens regularly).
You're devolving into the realm of "What if we tell the agent to just get it right?"
Relying on the prompt to ensure the code it writes is correct is where things fail. Types, tests, linting, etc. are deterministic tools the agents tend to respect.
They tend to ignore such instructions on first circular issue - even with Opus you have to kick it really hard, insist on generalization and intervene manually. In my opinion this is not a productive/workable approach for large projects.
Typical failure mode: "I fix pyright error A, it causes pyright error B, pyright is broken, I will exclude both A and B through pyright config and will add ignore annotations for both A and B and will write a couple of idiotic comments about that".
I tried that on a recent project. My conclusion: don’t use Python if you value your sanity. Ruff etc. are not type checkers, they’re the cargo cult equivalent of what a Python developer imagines a type checker would be like if they had one.