Hacker News new | ask | show | jobs
by llamaimperative 671 days ago
"Everyone knows this" yet you don't actually see organizations work like this almost ever in practice. Especially not software organizations. Instead you see big splashy initiatives to "overhaul performance" or whatever.

There is definitionally no more efficient way to improve the performance of a system than sequentially targeting each new bottleneck in its performance and nothing else.

Your example is just a case of picking a non-ideal method to improve a bottleneck. It's a lot easier to get this right when you're focusing on one important problem instead of generally "optimizing everything," which produces a clear incentive to take easy, immediately available, probably-not-ideal solutions all over the place.

2 comments

> There is definitionally no more efficient way to improve the performance of a system than sequentially targeting each new bottleneck in its performance and nothing else.

You are making an extremely strong statement here, how do you define "bottleneck" for this to be true? Many slow systems doesn't have bottlenecks, they are just slow overall with no bottlenecks.

And this is the key insight that "everyone knows" and you apparently do not: every system has exactly one bottleneck at any given point in time. The bottleneck can move, alternate, or not be so significant relative to other near-bottlenecks that it's hard to spot, but there is exactly one.

Your perception that "there are no bottlenecks" is exactly the perception Deming set out to disprove.

Riddle me this: how can a system perform faster than its single slowest component?

It cannot. Ergo, there is a single bottleneck that sets the pace of the entire system.

> every system has exactly one bottleneck at any given point in time.

Consider this system that has 5 sequential steps with these durations:

Step 1: 10 seconds

Step 2: 5 hours

Step 3: 7 seconds

Step 4: 5 hours

Step 5: 18 seconds

It would seem that both step 2 and step 4 are both bottlenecks. Are you saying that in reality one of those 2 steps would not typically be the exact duration so one of them would be considered the actual bottleneck?

In this example, assuming sequential steps, if step 2 must be performed before step 4, then it is step 2 which is the bottleneck.

After step 2 has been optimized, step 4 becomes the new bottleneck—assuming that optimization of step 2 is satisfactory.

While both steps 2 and 4 contribute to a slow system, a bottleneck means something else entirely: it is the single most significant point of slow down for the rest of the process.

To put it another way, it’s hindering the overall execution. If both use the same amount of time, then whichever is closer to the front of the process is by definition hindering more of the process.

> every system has exactly one bottleneck at any given point in time

What, no they don't. Does a straight glass have a bottleneck? No, most bottles have it, but not straight glasses, hence not every system has a bottleneck.

The same applies to IT systems, there the topology is much more complex so often can have many bottlenecks, or sometimes fewer etc.

> Riddle me this: how can a system perform faster than its single slowest component?

A perfectly optimized component can't be a bottleneck but can still be the slowest component, trying to optimize that further will not speed up the system at all.

Here we see that you will miss a lot of optimization opportunities since you think the slowest component is the bottleneck, and not looking further.

I don't find the glass <> IT system analogy compelling (or even sensical) at all.

Describe to me how an IT system can produce results (e.g. tickets closed, if you wish) at a rate higher than the processing rate of the slowest component.

> A perfectly optimized component can't be a bottleneck but can still be the slowest component, trying to optimize that further will not speed up the system at all.

Correct -- but neither will optimizing anything else! That's the whole point!

> Describe to me how an IT system can produce results (e.g. tickets closed, if you wish) at a rate higher than the processing rate of the slowest component.

It can't, but the slowest component can be perfectly optimized and thus not be a bottleneck. You would fail to find a real bottleneck in this case, since you are just looking for the slowest one, hence I have proven that your statement above was false, there are cases where the optimal strategy is not to just look at the slowest component.

If you have some other definition for bottleneck we can continue, but this "the slowest component" is not a good definition.

No, what you've done is you've failed to find a way to improve the system's behavior further. If you have the slowest component and you can't make it faster, then congrats: you cannot make the system faster.

You cannot build cars faster than you can mine metal, nor faster than you can put stickers on the windows on their way out the factory. You are done optimizing.

> There is definitionally no more efficient way to improve the performance of a system than sequentially targeting each new bottleneck in its performance and nothing else.

Doing so misses any state you can't hit via small iterations (you'll find a local minimum rather than global). It's easy to whack-a-mole every performance bug you can find and still miss that swapping to a columnar representation or using 32-bit keys instead of 64-bit is all you need to double performance for your program. Or that, in effect, the big ball of classes are executing something quadratic when linear solutions exist, but since that degredation isn't isolated to a small unit of code you can't identify it.

Doing so, even when you _can_ iterate to the global minimum, isn't guaranteed to be the most efficient path either. Imagine, e.g., speeding up foo (the slowest thing [0]), then speeding up bar (the next slowest thing), and finding in your refactor of bar that you no longer need foo. All the optimizations that went into foo were wasted.

In practice, I've had a lot of success using "lines of code deleted" as an initial north star. Rather than trying to optimize from the get-go, focus on changes which remove a bunch of code, or better yet make it easier to remove even more code in the near future. Once you've trimmed it down 5-10x [1], it's probably already faster and less buggy, but at that point the task of actually optimizing is much easier. I don't know that doing so is the most efficient solution per se, but whack-a-mole performance fixes I think are usually worse.

[0] If bar calls foo, then obviously bar is actually slower than foo (ignoring mutual recursion), but you have to pick a small enough unit to optimize; it's not very helpful to note that main() is your longest-running function call. In a real refactor you very well might have the foo->bar sequencing e.g. by seeing that the majority of bar's time is spent calling foo, thus concluding that foo is the "culprit" or otherwise has some low-hanging fruit.

[1] Almost all software I've seen has a tendency to accumulate cruft over time. That isn't a criticism of any individual's abilities, just an observation that the march toward new features tends to invalidate previous assumptions and degrade the project's code quality over time. Moreover, the act of running the previous version teaches things you didn't know when that previous version was written, allowing even the same author to make better informed decisions. If/when somebody decides performance is important, there's almost always an opportunity to delete a ton of code.

> Doing so misses any state you can't hit via small iterations (you'll find a local minimum rather than global).

When I worked in manufacturing, we distinguished between “continuous improvement,” which were these smaller improvements that will get you to a local minimum, and “radical transformation,” which will get you significant improvements and requires redesign of the entire system.

> Doing so misses any state you can't hit via small iterations

Nothing about this method of problem identification requires small or local improvements to fix issues. In fact it makes it much easier to justify larger scale changes because you can have confidence in the impact it'll have on your total output.