Hacker News new | ask | show | jobs
by empath-nirvana 846 days ago
This is just blameless post mortems and many, many many places implement this.

There are always going to be some level of "inadequate" employees, and also perfectly adequate employees that sometimes make mistakes in any organization and if your organization requires that no employees ever make mistakes in order to operate safely, then you have serious problems.

The purpose of a statement like that is that you don't just have a post-mortem that is like: "Our company went off the internet because an employee had a typo in a host name. We fired the employee and the problem is solved." When in reality the problem is that you had a system that allowed a typo to go all the way into production.

3 comments

It's like that story of the pilot who, after his refueling technician almost caused a crash by using the wrong fuel, insisted that he always have that technician because they'd never make that mistake again.
That was the late, and definitely great, R.A. "Bob" Hoover, I am proud to have shared a beer with him at Oshkosh. His Shrike Commander was miss-fueled with jet fuel instead of avgas because it was mistaken for the the larger turboprop model. Rather than blaming the individual refueler, he recognized that there was a systemic problem and developed an engineering solution. He proposed and the industry adopted a mutually incompatible standard of fuel nozzles/receptacles for jet fuel and avgas as a result. You can find some great YouTube material on him, or the film "Flying the Feathered Edge"

https://sierrahotel.net/blogs/news/a-life-lesson

https://en.wikipedia.org/wiki/Bob_Hoover#Hoover_nozzle_and_H...

https://www.imdb.com/title/tt2334694/

Here's an old timey video of Bob in his prime. At 8:55 he flys a barrel roll with one hand while pouring himself a glass of iced tea with the other. Hardest part was pouring the tea backhanded so the camera had a good view. Then he finishes with his trademark no-engine loop, roll, and landing.

https://www.youtube.com/watch?v=PT1kVmqmvHU&t=510s

the question is what do you do with the technician after the 2nd mistake. that is to say, When does this logic break down?
That's not really the question:

Punishment culture assumes people naturally do bad, lazy things unless they are deterred by punishment and fear. Therefore we must punish mistakes.

That perspective has long been debunked. You don't see competent, skilled leaders using it. It turns out that generally people want to do well (just like you do), and they don't when they are scared / activated (in fight/flight/freeze mode), poorly trained, poorly supported, or poorly led. They excel when they feel safe and supported.

If you are the manager and the technician makes the same mistake the 2nd or 3rd time, you will find the problem the next morning in your bathroom mirror. :) At best, you have put them in a position to fail without the proper training or support. Leadership might also be an issue.

I would say that every skilled leader must use punishments and consequences to some degree.

If your tech gets drunk every day and doesnt do their job, you need to cut them loose. This isn't a management problem.

Sometimes people end up in positions where they are not suited and will continue to fail. If you hired a plumber and you need a doctor, that isnt an on the job training, support, or leadership issue.

> you need to cut them loose. This isn't a management problem.

That is 100% a management problem.

> Sometimes people end up in positions

I wonder how they got in those positions? That sounds like a management problem too.

It isnt always managements job to make the person workout in the role. Sometime it is managements job to fire that person to find someone better.

Some people are bad fits for positions. They might look good on paper, they might be trying something new, they might lie to get hired, they might change after starting, they might have been a risky hire, or any number of reasons.

If you implemented some changes so the mistake is caught before disastrous consequences, you're already doing better. Well enough to let the 2nd one slide. Even the 3rd. After that, action seems reasonable. It's no longer a mistake, it's a pattern of faulty behavior.
That is a big IF. At some point it comes down to the error type, and if it is a reasonable/honest mistake.

The situation is very different if the fuel cans are hard to distinguish vs if the tech is lazy and falsifying their checklist.

Underlying any safety culture is a one of integrity. No safety culture can tolerate a culture of apathy and indifference.

I expect there's precisely 1 safety culture that can tolerate a culture of apathy and indifference -- one in which no work is ever completed (without infinite headcount).

You apply risk mitigation and work verification to resolve safety issues.

Then you recursively repeat that to account for ineffective performance of the previous level of verification.

Ergo, end productivity per employee is directly proportional to integrity, as it allows you to relax that inefficient infinite (re-)verification.

Exactly! All this talk about man vs system misses the point that man is the system designer, operator, and component.

This is why Boeing cant just solve their situation with more process checks. From the reporting, they are already drowning in redundant quality systems and complexity. What failed was the human elements.

Someone was gaming the system saying that the doors weren't "technically" removed because there was a shoelace (or whatever) holding them in place, Quality assurance was asleep at the wheel, and management was rewarding those behaviors.

Plenty of blame to go around.

Redesign the system again if it's unintentional. It is almost impossible to control humans to the degree that they never make mistakes. It's far better to design a system in which mistakes are categorically impossible.
I'm trying to push back on the knee jerk sentiment that there are no bad employees, only bad systems.

There are no systems that are human proof, and what kind of human behavior is tolerated is a characteristic of the system.

In fact, there are humans that lie, cheat, are apathetic, and incompetent. Part of a good system is to not only mitigate, but actively weed these people out.

For example, if someone falsifies the inspection checklist for your plane, you dont just give them a PIP.

> I'm trying to push back on the knee jerk sentiment that there are no bad employees, only bad systems.

Why is it important to you?

Because Im an engineer in a quality controlled field (Medicine), and my personal experience is that firms place too much faith in quality systems and not enough emphasis on quality employees.

I see lots of engineers and QA following a elaborate procedures with hundreds of checks, but not bothering to even read what they sign off on, so they can go golf all day.

People seem to think that you can engineer some process flow to prevent every error, but every process is garbage if the humans dont care or know what they are doing.

Every process is garbage is you dont hire workers with the right skills demanded by that process. In an effort to drive down costs, lots of companies try to make up for talent with process, with poor results, for both the companies and patients. you cant replace a brain surgeon with 2 plumbers and twice the instructions.

Falsifying the inspection checklist is not a honest mistake.
Yes there are obviously bad employees but the line for actual incompetent/malicious employee is a lot further away than most people understand.

A lot of bad management is hand-waved as crappy employees (by management - shocking!)

I think that scales very much with the complexity of the task.

If you are talking about someone who cant server coffee, the balance is clearly in favor of poor management over inadequate skills and trainability.

If you are talking about very specialized skills like aerospace engineering, I think the balance can move further in the other direction.

There is also the combination of the two, where in the interests of growth or cost savings, an organization has cut corners on the quality of talent hired.

I think that this anecdote [0] is appropriate for showing the glaring disconnects that can exist in the human<-->system symbiosis.

[0]: https://www.controlinmotion.com/news/news-archive/a-little-h...

It's seemingly simple "oh the technician keeps messing up"

Did the technician mess up (sometimes true), or were they doing their job in good faith - was it the system/protocol/organization that made the task mistake prone? Did someone else actually mess up but the situation made it look like it's the technician's fault? Does this technician do a task/service that is failure prone? Are there other technicians on other tasks that are far less failure prone? Here the former technician would seem poor, the latter, excellent, but it's a function of the task/role and not the person.

I've been "the technician" - I catch a lot of blame because people know I'm anti-blame culture, so I'd rather take the blame on myself that point my finger to the next guy in line. I'm also willing to take on high risk tasks for the greater good even if they suck and are blame prone / risky. I believe in team culture in this way. If the organization doesn't respect that belief and throws me under the bus, I leave - which is quite punishing for them since they remain completely unaware of a major internal problem. If an organization "sees me" and my philosophy, then together we get very very good at optimizing the system to minimize the likelihood of failure / mistakes.

Well certainly not after the first time at least

Imo it's a function of time, company and team culture, severity, and role guidelines.

If an employee makes a mistake but followed process, and no process change occured, that's just acknowledging the cost of doing business imo and would be a unbounded number of times so long as it's good faith from the employee

My point is that good faith and sufficient competence are crucial. If the employee didn't care if the plane crashed, they are a bad fit.

If they cant read the refueling checklist, they are a bad fit.

Ideally you have system controls to screen and weed these people out too.

> a function of ... severity

Not severity; that sort of thinking is actually part of low-safety cultures. A highly safe culture requires the insight that people don't behave differently based on outcome. In fact, most people can't assess the severity of their work (this is by design; for example someone with access to the full picture makes the decisions so that technicians don't have to). So they couldn't behave differently even if they did somehow make better decisions when it matters.

But, and I'll reiterate the point for emphasis, people make all their decisions using the same brain. It is like bugs; any code can be buggy. Code doesn't get less buggy because it is important code. It gets less buggy because it is tested, formally verified, battle scarred, well specified and doesn't change often.

Would s/severity/impact/g also be counterproductive of safety culture? Genuinely trying to learn here, gotta be responsible/accountable and all.

Maybe impact relative to carelessness/aloof-ity?

I agree that an engineer/person will not behavior differently based on outcomes, but if they know in advance something can have a wide, destructive blast radius if some procedure is not followed, I feel there's a bit more culpability on the part of the engineer. Regardless I don't think I feel I have a sufficient grasp on this concept I'm trying to define so definitely agreed I shouldn't have included 'severity' in the function definition nor any alternative candidate

You take him into a boolean tree within a and with another employee for quality and put him on a improvement plan?
maybe. or maybe you turn them over to the authorities because the 2nd time their lazy and reckless disregard killed several people.
Exactly. https://asteriskmag.com/issues/05/why-you-ve-never-been-in-a... is a great article illustrating this in the airline industry itself.
> When in reality the problem is that you had a system that allowed a typo to go all the way into production.

That's a typical root cause, and is exactly what should come out of good post-mortems.

But human nature is human nature...