|
|
|
|
|
by not_kurt_godel
2719 days ago
|
|
> 2. I'm not sure I understand here. What version of your application is your canary running if you're not using it to validate a new version? The same as the baseline? But then, what are you using it for? We run canaries continuously against all stages of our pipelines, including prod. Once a change is deployed to a stage (initially triggered by commit to git repo), there is a time window where canary failures will trigger approval workflow failure and rollback (note that we additionally have separate integration tests that also run as part of this approval workflow). Canary failures at other times trigger steady state alarms. Thus, our canaries serve both to validate new versions and continuously monitor the existing versions. |
|
This is bad.
Canaries are not alerts. Alerting should be separate from canary. If you're getting steady state alerts from your canaries, you're steady-state alerting is bad.
A system that isn't undergoing change should be able to entirely disable its canary, and you should still remain confident that any changes in traffic or whatnot should be handled by other alerting tools. If not, then your canaries aren't correctly balanced, and your getting canary failures due to traffic imbalances or something that shouldn't be affecting a good canary. In other words, if you're canary is alerting in steady state, your experimental setup is invalid and you don't have a good test/control pair. If you did, the only steady state alerts you'd get would be noise.
You can't have the same experimental setup control for production differences to isolate changes due to binary version bumps, while also detecting changes in production traffic independent of binary version bumps. At least one of those is broken.