Hacker News new | ask | show | jobs
by rrrrrrrrrrrryan 2243 days ago
But how can they set goals without any visibility into what a team can typically accomplish in a given time period? How can they identify better performers and worse performers?

I'm not saying agile is the right solution, (it probably isn't), but expecting higher-ups to fund a black-box team is kind of naive.

3 comments

Ye olde SV Scrum clip[1]

"This just became a JOB..." -Gilfoyle

[1]https://www.youtube.com/watch?v=oyVksFviJVE S01E05 Scrum scene

I do like Agile, it's the typical problem of managers misusing it to micromanage, engage in useless metrics etc.

Measure the deliverable, not the deliverers.

Reporting on random internal stuff instead of the actual problem at hand is the #1 problem I see with corporate reporting, everywhere. I see various combinations of people wasting time measuring:

1) The thing that is easy to measure, typically money or time.

2) The things they "understand", typically people for HR, compliance for legal, money for finance, etc...

3) The things their manager wants to know, no matter how irrelevant that is to executing their own job well.

Meanwhile what they should be measuring is the qualities of the end-product or the overall external customer end result.

It doesn't matter one iota if Bob the Developer Guy missed an internal 3-day deadline that John the Manager made up on the spot if the end product is a winner in the market and makes the users ecstatically happy to part with their money.

This happens everywhere, with everybody. For an IT-centric example, the common one I see is:

Helpdesk: "The users are complaining that the app is slow"

Admins: "The load is only 10%, but fine, we'll add more capacity!"

Helpdesk: "The app is still slow!"

Admins: "The load is only 5%! They should have no reason to complain!"

Do you see the issue? No, seriously: do you? Because practically nobody does, in my experience. Take a minute.

What happened here is that the admins measured the thing that is easy for them to measure: the load. There's a cute little bar graph in VMware, or a chart in their network appliance, or whatever. What they should have been measuring is latency from the end-user perspective, but that's hard to measure and practically no product tells you this number out of the box. So their entire process, their reporting, their troubleshooting, their forms, requests, everything becomes focused on the thing that they can see and control. Even if it's pointless, ineffective, and basically a waste of everyone's time and effort.

This happens with developers in exactly the same manner. Software quality is stupid hard to measure. Long term supportability is borderline impossible to measure without a time machine. Technical debt is hard to even explain to a manager, let alone keep tabs on in terms of numbers. So what's easy to measure? Time! Deadlines, sprints, release dates, etc... That's super easy.

That's why inevitably the unimportant internal time metrics become critical to everybody, but the actually important metrics aren't even measured and become invisible to management until it's far too late.

As a manager, I want to measure leading indicators of success and failure. Absolutely, feedback control based on the real output is important. I must measure that! But I’m always looking for ways to predict that, so I can steer more gently. What’s a leading indicator of a crisis? A late team struggling to make a deadline. What’s a leading indicator of that? Mismatch between estimates and performance. I need to know about bad point estimates because if I don’t fix it—I mean fix the PM’s misalignment—they’re going to push the team into something dangerous.
The problem is that these "leading indicators of success and failure" aren't. A late team struggling to meet a deadline might be a sign of imminent failure, or it might the team working hard to do something that is genuinely difficult to do.

The core problem with Agile (most forms of software management) is that it massively overweights "first mover" advantage. I keep hearing, as a justification of agile, that software needs to be delivered quickly so that the company can go to market first, and gain marketshare while its competitors are still floundering. But, in practice, that's hardly ever true. I can't name a single product that succeeded solely because it was on the market first. I can name many products that were first to market and failed because they were clunky and difficult to use.

Heck, Apple's entire business model consists of being second to market with a product that is more polished and easier to use than its competition.

Yes, if a developer or team is well and truly stuck (as in spinning their wheels on the same problem, week after week), that's a problem. But you don't need Agile to tell you that. A simple weekly status meeting with incremental demos is sufficient. The only thing Agile does is create a bunch of graphs that allow management to comfort themselves with "story points" and "velocity" so that they don't have to confront the hard reality that they have no idea what it is they want to build.

What about maintaining an environment where the team felt safe to communicate to the pm that there was something wrong?

And for the pm to feel safe enough to communicate to you something is wrong.

This feels like a Taylorism. Knowledge work isn’t factory work.

I totally understand keeping tabs on delivery speed to enable the team to benchmark themselves but the act of identifying a problem from that (if there is one) should be the teams responsibility imo.

As a manager, my job is to enable other people to do their job the best they can.

Part of the reason that DevOps culture became a professional movement was to treat both as an end-to-end problem.

Which stops misunderstandings like throwing the code to “Ops” and expecting the VMware admin to understand how an app behaves.

User response time is a fairly simple metric to measure if you’re a developer. They should expose that, and other metrics, that the customer values.

This is one of the reason I’ve enjoyed some “true” agile teams with a strong product owner: they encourage an end-to-end focus on outcomes.

I feel like Jonathan Blow's jai language looks to do things like this as part of the language, at least to the developer.
"You can't manage what you don't measure"

vs.

"Not everything that can be counted counts, and not everything that counts can be counted"

I think both statements are true.
People manage things using their judgement and qualitative observations, all the time. We can accuse them of bias, but that has to be weighed against the fidelity of the metrics.
This is a good breakdown of this really common dynamic.

It is recognizable to many people, some of whom use it for their benefit, which can be very effective. When I learned that last fact a lot of things made more sense.

This makes important points.

But time and money are the things by which companies live and die. Not keeping tabs on them is suicidal.

OTOH measuring time and money consumption of some technical internal steps is just uninformative.

What happened here is that the admins measured the thing that is easy for them to measure: the load.

Nah, you have it completely backwards. If the users said “this specific job took 5 minutes today but was only 1 minute yesterday”, that’s actionable, you can e.g look at what changes were deployed overnight.

But users always say “the system is slow”, even if they have only the vaguest idea of what “the system” is, and even if it’s actually faster than yesterday. It’s not really clear what any sysadmin can do other than spending hours every day painfully extracting the details from the user only to find nothing is wrong. Every day, forever.

> It’s not really clear what any sysadmin can do

That's not true. It's just that most sysadmins don't bother to upskill to find out what they can and should be doing.

> painfully extracting the details from the user

Asking users for any information is a recipe for disaster. Much like witnesses to a murder that can't agree on the most basic details, users inevitably conflate totally unrelated things. E.g.:

"Citrix is slow?"

"Okay, how so... are button presses slow to respond to a click?"

"I couldn't log on. Something to do with my password. It's slow."

"ಠ_ಠ"

So don't ask. Don't rely on your users at all. Build synthetic transaction tests that act like users. Measure end-to-end latency. Sit down with them and watch them work. Don't rely on their verbal feedback, use your own eyes. Use your tools. Measure. Then measure some more.

Conversely, capacity metrics are largely irrelevant in the era of 10 Gbps networks and 64-core server CPUs. Focus on latency. Look for delays. Timeouts. Deadlocks. Firewall packet drops. That kind of thing.

> only to find nothing is wrong. Every day, forever.

Of course something is wrong! Something is practically always wrong, that's why the users are complaining!

Here's a fun rule of thumb for you: For every 1 user that complained, there are between 100 and 1,000 that had the same issue but shrugged it off and didn't call support.

I got that from a scientific paper. I couldn't believe it, so I measured it in a large 10K user system. The error-to-call ratio was about 500-800 in ours. It blew my mind, and it blew the minds of a lot of people in IT management.

We started gathering every error, tracking every possible latency measurement we could, and it was a horror show. 30K app crashes per day. I shit you not. That's about 3 per user per day! Data loss. Hangs. Login failure rate of nearly 50%.

It tooks months to triage the issues, push patches, and apply workarounds. We had to rewrite several components. We eventually got the errors down to less than a hundred per day. Believe me, that was a real achievement.

Users were so happy they were begging to be migrated to the new system instead of pushing back and refusing to upgrade.

If the users are complaining, something is probably very wrong and you just don't know it. Go look.

For every 1 user that complained, there are between 100 and 1,000 that had the same issue but shrugged it off and didn't call support.

I wish I had your user community. Here in Wales the ratio is reversed, I guarantee it.

By assigning problems to specific responsible individuals, and noticing the existence / quality of the solutions they produce? If anything, Agile obscures individual performance, in that it treats everyone as fungible and every part of the system a commons.