With regard to web-based services, once you’ve got the ability to do canary testing, IMO flags/toggles are less compelling — busier code and logic you’ll have to pull out later.
Canarying gets you a 1/n treatment group, but it might be skew geographically (all affected users are near the canary’s datacenter). You need a percentage in a feature flag if 1/n is too big and you want, e.g., 0.1% of traffic.
I agree that if you have only a few changes going to prod, fast and doing canary testing, you should be covered. In my experience that's rarely the case because of multiple teams deploying changes at the same time, and even deployments in external services causing side effects in other services.
Emergent inter-service issues are challenging to deal with regardless.
I’ve absolutely seen canary testing work in large environments with a lot of teams doing frequent deploys. The teams need to have the tooling to conduct their own canary testing and monitoring.
As soon as you’re involving external services or anything persistent you may not be able to undo the damage of misbehaving software by simply disabling the offending code with a flag.
In practice the cost/benefit of feature flags has never proven out for me, better to just speed up your deploys/rollbacks, the caveat is I’ve only ever worked in web environments, I can imagine with software running on an end user device it could solve some difficult problems provided you have a way to toggle the flag.