|
|
|
|
|
by hayst4ck
1283 days ago
|
|
When you give responsibility to teams themselves the result is O(1) size problems becoming O(teams) sized problems. > How can I check the health of the service?
In the definition of service, you define a field for
health check script.
> How can I safely and gracefully restart the service?
This will exist within the script used to push new code.
> Does it has any external dependencies?
This could be defined in the service configuration and
used for setting up integration tests and automatically
generating a dependency dashboard.
> Do you have a playbook, or sequence of steps, to bring
the service back up?
You could generate a field in the service defintion to
automatically generate a dashboard and include the
playbook link at the top of the page.
> Do you use appropriate logging levels depending on the
environments?
Production could be extremely opinionated about what
acceptable logging looks like, forced via code review. Log
level could be defined in service config.
> Are you logging to stdout?
Why would any production service get to choose?
Service owners shouldn't be able to log into machines.
> Are you measuring the RED signals?
Required fields in service config that could be used to
generate a service dashboard.
> Is there any documentation/design specification for the
service?
Required config field.
> Are you using gRPC or REST?
Trivial grep.
> How does the data flow through the service?
This is complicated, but can probably be easily replaced
by asking what state your service keeps and how it's
stored. This is the only question I think the author
should/needs to ask.
> Do you have any PII/Sensitive data flowing through the
service?
While this question is important, this is one of the
problems that has to be a particular person's
responsibility. Any dev that answers anything but
"probably not, but I don't know" shouldn't be trusted.
> What is the testing coverage for this service?
Some form of this would exist in a service config.
I don't think the question of responsibility is as simple as "it's the team's problem." |
|
I’m not sure I follow what it is.
A technical construct, like a code template or a API that services implements ?
Or a process constructs, like a SOP to follow with checkboxes?
Thanks