|
|
|
|
|
by dastbe
110 days ago
|
|
From a dataplane perspective, it does mean your healthchecks are running from a different location than your proxy. So there are risks where routability is impacted for proxy -> dest but not for healthchecker -> dest. For general reliability, you can create partitions of checkers and use quorum across partitions to determine what the health state is for a given dest. This also enables centralized monitoring to detect systemic issues with bad healthcheck configuration changes (i.e. are healthchecks failing because the service is unhealthy or because of a bad healthchecker?) In industry, I personnaly know AWS has one or two health-check-as-a-service systems that they are using internally for LBs and DNS. Uber runs its own health-check-as-a-service system which it integrates with its managed proxy fleet as well as p2p discovery. IIRC Meta also has a system like this for at least some things? But maybe I'm misremembering. |
|