Hacker News new | ask | show | jobs
by hinkley 775 days ago
As computer people, I believe we have access to more information on this through the field of Queueing theory.

One of the aspects of Queueing theory is responsiveness, and a system with a saturated queue has none. I see this play out over and over again in both machine and human capacity planning. Even with Agile we can’t get stuff done in a satisfactory time frame because we always have a backlog sized for a team twice the size of the one we have, when responsiveness is maximized when the system is running at 50% of maximum throughput.

One of my mentors was really into Goldratt and Ohno, but The Goal got stuck in my tsundoku pile for years. I’m a third of the way through it now (I’m using audiobooks to get through books I “should” read but never do), and it is starting to turn into thinly veiled queueing theory, but from what my mentor said he refers to it instead through the metaphor of drum-buffer-rope. But there’s a lot more to this field, and as I said before, you can apply it to our applications directly, not just to the building of them.

5 comments

> Even with Agile we can’t get stuff done in a satisfactory time frame because we always have a backlog sized for a team twice the size of the one we have, when responsiveness is maximized when the system is running at 50% of maximum throughput.

Agile falls down when people misapply it, same as anything else. It's not just the size of the backlog; it's being able to limit work in progress so that you have the ability to adjust. What's more, management needs to get on board with the idea of probabilistic forecasting that's continually revisited, as opposed to trying to stuff complex work into Gantt charts and deadlines. Sadly, most of modern management refuses to make these changes, and too many folks in the trenches don't want to take ownership of their work and just want to be told what to do.

There’s a modification to Gantt charts that uses Monte Carlo simulation to come up with a more believable timeline, but nobody likes bad news so it’s a fringe Agile thing instead of mainstream.

Great companies are few and far between. Everyone else thrives on self deception.

But if you're doing Monte Carlo, you might as well just iterate and keep re-running the Monte Carlo as you burn through the backlog, because that's a better way of having up-to-date information that using any kind of Gantt chart.
You don't have to pick one or the other. A Gantt chart is just a pretty and easy to read graph based on the topological sort of activities and their dependencies and durations laid out in time. They also aren't meant to be static, unless you're working for a badly managed org that uses Waterfall the Gantt chart gets updated as things progress and new information comes in.

If you have a backlog, and you don't mark what is dependent on what (in progress or also in the backlog) you're just hurting yourself. Once you add that information and some very basic estimation (even just scale of expected effort is enough) you can generate a Gantt chart and use Monte Carlo simulations to get an understanding of your time estimates.

In my experience, estimates are the root cause of quality problems and expectation mismatches. No one treats them as estimates but as actual calendar times.

I've been fortunate to steer my company towards simply prioritizing work and communicating the prioritization to the rest of the company and more importantly, our customers. We don't give time estimates or timelines to customers, but provide constant updates on where something is. No one has complained about this in general. Of course, there are always exceptions - we resist them all we can, and that too is reflected as reprioritization of backlog.

  > No one treats them as estimates but as actual calendar times.
its true, though i suspect this is partially because its what the management chain wants: a date to report when it will be done (e.g will my okr for this quarter be met or not?)

  > estimates are the root cause of quality problems and expectation mismatches
i have seen this over and over, most quality issues and incidents are caused by decent programmers rushing to meet the immovable deploy/release window... because 'your not gonna make your estimate'....
I’ve only ever seen Gantt charts used by waterfall assholes who talk about commitments and promises to get free overtime.

Some techniques are good but ruined utterly by the company they keep.

But as I said, the problem isn’t really technical anymore, it’s emotional.
> Sadly, most of modern management refuses to make these changes, and too many folks in the trenches don't want to take ownership of their work and just want to be told what to do.

Interestingly enough this is something I see in management books and operations research books going back to the 70s. It's a lesson that hasn't been learned.

As for the ownership - I think that makes a lot of sense. People in the trenches know very well that they don't really own the thing, but are just at best responsible for it. I think that is perfectly fine, and the whole "ownership" language tends to obscure very real power dynamics.

> management needs to get on board with the idea of probabilistic forecasting that's continually revisited

From the manager's pov, though, that just sounds like guesswork. "When will my house be built?" "Eh, not sure, but theres a 60% chance the framing will be up by July".

Development managers need to learn to communicate on the same wavelength as their customers, and vice versa. It rarely happens.

The thing is construction people do talk like that.

I think that’s why rich people often make terrible customers. They are just as grouchy at plumbers and general contractors as they are at us.

Which reminds me, one of my life goals is to get a full rundown of GC tricks to apply to software development. I’m running out of time for that to make a quality of life difference.

You're conflating the complicated with the complex. Construction workers don't need Agile methods, which is why "there's a 60% chance the framing will be up by July" sounds so dumb. The physical properties of wood framing, electrical wire, shingles, and drywall haven't changed in decades. You can make detailed plans around these known facts, and workers generally know predictably what it takes to build a house.

Software is not like that. Codebases are too big, especially counting third-party dependencies. Tech debt is lurking everywhere. Customers don't know what they want until they see it. So yes, in enterprise-sized software, you need probabilistic forecasting precisely because you're NOT building a building. It's impossible to know things in enough detail up front to make big up-front plans that don't largely change like you could if you were building a house.

I say this with love but, you've never worked in construction have you.

Architects drawings are little more than nicely descriptive hopes and sketches of an intended idea, which a good contractor has to turn into an actual plan of work.

You want to see chaos go talk to the person running the development of a high end property in New York.

I have a way of getting people to talk to me about their professions. At the tail end of a water damage repair, the GC confessed to me that they flood customers with trivial choices to distract them from the illusion of choice in other areas. Like the physics of plumbing dictating where sinks can go.

As I mentioned up thread, I want to buy a GC beers and get them to tell me more, before I ever take another contracting gig.

You need probabilistic forecasting for everything. I used it with great success to forecast the costs of the general renovation of an apartment we were moving into. Seeing the shapes and ranges of distributions was very informative (and in particular informed the decision on whether and when to take an extra loan). Can't imagine doing it any other way now, even though I had to hack my way into doing it, because approximately none of the tools I know of support this out of the box.

(I ended up using Guesstimate for it - https://www.getguesstimate.com/ - pushing it to the limit of nearly hanging my browser.)

Problem is, most people seem to be overwhelmed by those ideas. It's not hard, but then again multiplication isn't hard either, and most people are afraid of that too. This is a problem because software tends to target the lowest common denominator, which is how we get a million Trello clones, but no tools that understand that work breaks into DAGs, not 2-level-deep trees, or that Gantt charts are good to have, or that probabilistic Gantt charts would be even better.

> You're conflating the complicated with the complex. Construction workers don't need Agile methods, which is why "there's a 60% chance the framing will be up by July" sounds so dumb. The physical properties of wood framing, electrical wire, shingles, and drywall haven't changed in decades.

When's the last time you saw a construction project that landed on time? Construction projects are a classic example of forecasting difficulty. Lots of things have to go smoothly at the right time, including supply chain, coordinating work from multiple organizations (including local governments), and the weather has to play nice.

> too many folks in the trenches don't want to take ownership of their work and just want to be told what to do.

And do you blame them?

If pressed I’ll put this as, “once you substituted your judgement for mine this became your problem.”

I’m not going to enthusiastically absorb the consequences of bad decisions I already tried to route us around.

A backlog is not a queue. It's just a mutable list. A principle of Agile is that you can change your priorities and reorganize the backlog to put the currently highest-priority stuff first.

(This isn't unique to Agile - any bug database could do this, though they don't typically stack-rank things.)

Putting high-priority tasks first increases responsiveness for them, but will starve lower-priority tasks. That's inherent, but at least management gets to prioritize.

> A backlog is not a queue.

That's only slightly true, and it's a dangerous assertion from a process standpoint to say it is so. Everything is queues. Work in progress, undeployed code, dark features, sales pipelines.

There is a queue of requirements we have defined but haven't acted upon. That's contained within the backlog, along with bug reports and wishful thinking. The backlog is approximately a superset of incoming feature request queue (modulo anything that skips the backlog and goes straight into WIP)

To expand on your point: Queueing theory applies whether the thing is a proper FIFO queue, a LIFO stack, a statically prioritized priority queue, a dynamically prioritized priority queue, or a bunch of cards grabbed randomly out of a hat. If things keep coming in faster than they can be handled, the queue, no matter its form, will just continue to grow. That a backlog is not a proper FIFO queue in most orgs doesn't change this fact.
I still think "mutable list" describes the situation better than "queue," at least for programmers familiar with common data structures. No argument that queuing theory is useful.

Defining "responsiveness" in terms of everything that anyone has ever wished for seems like a bad thing, though. We can have a brainstorming session that comes up with a long list of features that would be nice to have, end up throwing them out, and that's perfectly fine.

I think you have a poor understanding of what queueing theory is. You should read more and type less.
Attacking someone's knowledge level does not dismiss their argument, which is that you might be imprecise in your use of the term responsivity in your first statement, and could be using it to impress rather than inform your audience.

Let's get to the bottom of this in an illuminating and polite way, without further insults.

You made two statements about responsivity:

1. If your queue is 100% saturated you have 0 responsivity.

2. It's common to see maximum responsivity if the team is only pushing 50% of their maximum throughput.

I'm assuming that you are either

1. defining a responsivity metric for backlog items and using it consistently in both cases

2. Using responsivity in the imprecise business consultant sense of the word which is being criticized

Case 1: You're a serious Queue theorist

If you're doing it in the first sense then could you please give us a more technical explanation of which responsivity metric you're talking about and how these relationships attain?

Some questions there:

1) I can't find a responsivity metric in basic queueing theory. I find average occupancy, thoughtput, waiting time, service time, etc - all metrics which describe /aspects of/ responsivity. Can you point us to more specific and holistic responsivity metrics you have in mind? What makes it the "right" metric to capture something as abstract as responsivity?

2) How can a saturated queue have no responsivity? It seems that saturated queue - which I define as a full queue that is still serving requests - still responds in the average waiting time, plus now there's a chance that the request is rejected by a full queue. Assuming the queue is still serving, there should be some nonzero probability that a request will be served, because a position in line opens up, for a brief period, every time a request is completed. So responsivity can't be strictly zero, can it? It would make sense to me here if our responsivity metric drops so close to zero, compared to normal functioning, as to not make any effective difference. For example, if 95% of our requests are being rejected, and all accepted requests have awful waiting times, then it makes sense that responsivity should be considered effectively nil.

3) where can I learn about the model where maximum responsivity requires some sacrifice on throughput? Usually for this kind of result we have some curve, we take it's derivative, and it turns out the critical point is some equilibrium or maximum. Do we really have the ability to do this for coding teams and their backlogs? That could give us serious objective power in negotiating our work throughput! But it requires a trustworthy model.

Case 2: Your a business consultant and you're laying it on a little too thick

Finally, if you're doing it in the second sense, then you are using the term in the sense that is being criticized. In particular, using it without a formal definition, in a way such that a business audience would hear "responsivity" as whatever metrics they care about most. That would mean you use queuing theory terminology to misinform and manipulate those unfamiliar with queuing theory - probably the most used application of queuing theory in the industry. If you're intentionally doing this, and willing to insult people who call you out for it, we probably couldn't persuade you to own up to it or stop. But if you're not aware of it, you might reconsider how you use queuing theory terms, and commit to being accurate and objective. I state this not only for your benefit but for my own benefit, and for the benefit of anyone who might use technical terms imprecisely! It's always best to be able to ground technical statements in objective theory.

Conclusion

Assuming that you are using the term accurately, with a particular metric in mind, I'd love to learn about it and analyze the metric in our two pet cases. If you're being shady, I hope you'll recant and improve the accuracy and honesty of your language. Thanks for your patience and dialogue!

OP mentioned The Phoenix Project [1]. It's The Goal applied to IT.

One key issue the protagonist has to overcome is how to address the issue of an endless backlog.

[1] https://www.goodreads.com/en/book/show/17255186

> we always have a backlog sized for a team twice the size of the one we have

While it certainly is true, the side-effect is at the same time the oldest items in the backlog deprecate with time.

It’s more of a stream or a pipeline than a queue.

Wait till you realize optimizing data flow through an API process serving multiple requests is the same meta problem as optimizing value flow through a development team. :)