| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bradleyy 1234 days ago

We don't exactly celebrate failure, but:

  1) We have a culture that accepts "failures happen". People screw up, sometimes badly.
  2) Imperative to this is a culture of ownership. You hide failures, your job is at risk. You own it, go public, get help, and all is well. Well, not exactly well, since there was a failure, but you get the picture.
  3) Blameless post-mortems (hate that term, nobody died)
  4) We do have a tongue-in-cheek slack channel for those folks who have caused a "p0", i.e. most-severe-breakage

I'd not want to work somewhere that didn't have the above; it's just part of being a mature organization.

4 comments

hackmiester 1234 days ago

#4 does more good than most realize. If you can’t joke around about it then you aren’t really comfortable with it. And we all must be comfortable with the potential to cause problems occasionally. You have to break a few eggs to make an omelet.

link

madeofpalk 1234 days ago

> post-mortems (hate that term, nobody died)

Post-incident review

https://support.atlassian.com/jira-service-management-cloud/...

link

OJFord 1234 days ago

We say our battery's 'dying' or the site or something's 'dead' or 'live' or 'alive again' though - post mortem makes sense in that case, I understand the objection but I think you have to object to the rest too.

link

madeofpalk 1234 days ago

We declare an "incident". After it, you review the "incident". The site might not have even "died", but you still have an incident.

link

OJFord 1233 days ago

I agree it makes more sense, from first principles as it were, I'm just saying 'post-mortem' is consistent with other usages, so if we don't like it then we probably shouldn't like 'the server is dead' or 'the site is on its knees' etc. either.

link

bradleyy 1234 days ago

I'd also add that high-performing teams require psychological safety (e.g. mistakes don't get punished), the research is pretty clear. So punishment is explicit blocking of high-performance.

link

mytailorisrich 1234 days ago

It depends what kind of mistakes and whether they are repeat mistakes.

Everyone makes mistakes, sure, and that should be accepted, but if they are caused by carelessness or incompetence, or if the same mistake keeps being repeated then there ought to be consequences.

A safe environment does not mean anything goes.

link

bradleyy 1233 days ago

I honestly believe that this is an exception. Most people aren't going to be repeat offenders if proper blameless post-mortems/root cause analysis are performed. It will get to the roots of what actually caused it, and that can provide for a training opportunity.

Incompetence is generally caused by a lack of training, and carelessness is driven by culture. Yes, there really are cases where an individual is a problem, and potentially even needs fired. But most of the time it's the organization and its culture, policies, etc. that are truly at fault.

link

maayank 1233 days ago

I think to an high degree it depends on whether the individuals could realistically prevent new instances given resources (e.g. access, head count, number of other high priority tasks, etc.)

link

drewcoo 1234 days ago

Failure doesn't have to mean "people screw up" and one would hope there'd be something in place to avoid failing badly.

Failing early and often is a healthy approach to learning. That doesn't shame people or cause serious harm.

link