Hacker News new | ask | show | jobs
by donavanm 2719 days ago
For #2 canaries are commonly run against the stable/“production” deployment on some sort of periodic basis. This is used to approximate a customer experience and detect faults in underlying components, intermediate infrastructure, or changes outside of “the software” like configuration data. Its an adjunct or backstop to metric based anomoly detection. From what Ive seen.

Edit: as an aside theres a very interesting area of discusssion around the spectrum of integration tests, canaries, & user experience monitoring. If you change the periods and the sources they seem to blend in to the same outcome. Ie write your integ tests to cover the ux. Run them continuosly. Associate results with underlying inputs. Suddenly theyre very much the same thing.

2 comments

I suspect you are confusing the terms.

What Google/Netflix call a "Canary" other companies call "deploying a single host/percentage of production traffic with a new version". When other companies talk about canaries they mean regular tests against production to detect issues.

Youre correct. I didnt catch the distinction when I first skimmed the article and parent comment. Part of it is that my active “canary” tests themselves emit relevant TSD indicative of system performance.

The general concept outlined Id lump in to “approval workflows.” The gradual, intentional, deployment of mixed versions to the same workload Id call something like A/B or red/blue version deployments.

At least from what I've seen red/green (I've heard blue/green) or A/B deployments represent a different thing. A blue/green deployment says you have 2 environments, each able to handle all of your traffic. So you have 2x the servers you need running, and move traffic between environments to upgrade. Its double buffering, but with binary versions.

The (traffic) canarying process that Google and Netflix use, and that is described in this article is distinct from that, since you don't need a significant amount of overhead.

Huh. Nomenclature. FWIW Ive also never heard of A/B being limited to binary or requiring full N sets of resources. I've only seen it as small subsets of traffic that is ramped up to some confidence interval. Similarly two concurrent variants is the simplest and minimal value. But Ive also seen literally thousands of concurrent variants with enough workload & consumers. Agree on overhead, as it's essentially a version management + stable routing problem you dont/shouldnt increase resource requirements.
Yup, the key point is they let you treat complex systems as a blackbox that either does or does not successfully support the end-user functionality/experience. Ideally the components of the blackbox are sufficiently self-regulating that the end-to-end functionality is never interrupted, but complex systems by their nature exhibit emergent behavior that may not register as anomalous at a per-component level.

Totally agree the nuances regarding levels of coverage between various end-to-end testing mechanisms is an interesting subject. Generally speaking I think it's preferable for canaries to essentially be continuously-run integ tests, but this is not universally applicable. Deep integ tests may be too resource-intensive to run continuously. Integ tests also aren't always ideal for monitoring behavior of long-lived resources as they typically start from a blank slate, create some new resources, test their functionality, and then clean them up. Canaries may be geared towards validating state that persists indefinitely by design.