Hacker News new | ask | show | jobs
by mowfask 1503 days ago
My approach to solving problems that span complex systems:

1. Instrument 2. Measure 3. Interpret 4. Act

Iterate as necessary.

I have come to see this pattern working on electronics design, embedded software, industrial control systems, networking and webapp backends.

Breaking it down:

1. Instrument

Understand subsystems and their interfaces. Use tooling around these interfaces to trace the interplay between subsystems. Make sure all tooling is synchronized so you can correlate information across tools via timestamps. If you can't instrument remotely, bite the bullet and reproduce locally. This ties back into the design phase: Design interfaces to be instrumentable, ideally remotely. Test points on PCBs, traceable APIs in software, using network protocols that tools like wireshark can decode. Pub/Sub systems are great for this, as you can easily add another subscriber for instrumenting all communication. Don't rely on "what happens to be available" for instrumentation. AWS CloudWatch will miss that one crucial piece of information. Your oscilloscope tip will not make reliable contact on a QFN pad. Simply stated: Become good at interfaces and make them accessible.

2. Measure

Take the time to properly run tests and gather data. For issues in systems spanning mechanical, electrical, digital and software domains, you won't have one tool to do it all for you. Data preparation and cross correlation will be a manual process in most cases. That is ok.

3. Interpret

This is about understanding your problem and digging down from high-level symptoms to low-level root causes. Don't jump to conclusions. Let the data sink in to identify second order effects. Don't rush it because of pressure from your boss or the customer.

4. Act

Now that you understand your problem at a deeper level, it should be straightforward to apply corrective action. This might not solve the issue yet, but you will get closer to the root cause.

Two notes:

* Never stop after step 4! Always iterate once more so you can be confident the issues is actually solved and not just hidden by some effect.

* If you're a team player, document each step. A short note and screenshot in an issue tracker go a long way.

About engineering mindsets:

I find it infuriating when people calling themselves engineers don't follow any practice like this. Yes, you can solve problems through sheer experience or by hitting your head against the wall for long enough. Alone. On simple systems. But working together on complex systems you have to apply some methodology. Doesn't have to be my methodology, just not no methodology. For me a big red flag is when engineers don't understanding why something works. Not understanding why something doesn't work is ok. We are human and systems are complex. But getting something to work, wondering why it does and then sending it to the customer? That's not engineering, that's tinkering. It's asking for trouble.

1 comments

> I find it infuriating when people calling themselves engineers don't follow any practice like this.

Andy Grove's (the engineering legend) wisdom is summed in one sentence: "Everything is a process, which can be improved."

If you don't have a process for solving a problem (where problem is as implied by the parent), you are not engineering.

I agree. Note also that if your process resembles sacrificing a goat you might also not be engineering (a process is necessary but not sufficient).

IMO an alternative way of putting it is you need to apply the scientific method.

https://en.wikipedia.org/wiki/Scientific_method

edit: grammar