Hacker News new | ask | show | jobs
by ChrisCinelli 1026 days ago
Without an outstanding company culture, most KPIs are almost useless.

While working at a big corporation we had a velocity initiative supposedly aimed to lead the company toward continuous integration.

"How long a PR stays open" was one of the KPIs in a dashboard.

I said: "Be careful with that!"

People started to close PRs and reopen new PRs with the same code.

Middle managers and sometimes the person in the division that was the point of contact for the velocity initiative were asking to do that.

The script measuring this KPI was improved to look at the branch name and the code diff. Result? People changing branch name and a change of EOL encoding in the new PR.

Learnings? B and C players with questionable ethics screw companies quite rapidly.

In this climate KPIs and aligning them with company values is futile.

5 comments

When I worked at a FAANG ~15 years ago, a new VP came in and heard, correctly, that our group didn't have enough automated tests. He created a requirement that every developer commit two new tests to the code base every workday. He had automation put in to monitor our compliance.

Within a couple of weeks, scripts were circulating to auto-generate and auto-commit tests. If the JVM we were using had a bug in adding random numbers together, we'd have known about it very quickly.

That's about the time I decided I should move on. I'm glad I did. The company I joined treated developers like adults and developers acted like adults. And we had great automated test coverage.

Obligatory mention: https://en.wikipedia.org/wiki/Goodhart%27s_law

Accountability is a big part of leadership, though. The anecdote basically just says the VP used the wrong tool for accountability, not that "adults" shouldn't be held accountable.
I think what was missing here was lack of developer buy-in into the actual changes implemented, and making sure that they were reasonable and sustainable. Sounds to me like a blanket mandate was handed out without buy-in and also wasn't achievable, and the developers felt they had to work around it to get their real jobs done.
I agree. At the very least, it seems like he didn't communicate the "why". It certainly wasn't "to make as many tests as possible". I suspect the developers knew that, which is why is also a little unprofessional on their part to treat it like that was the goal.
I think the bigger problem here was inappropriate expectations: "two tests a day". That's not a reasonable way to increase test coverage. And so the developers quite reasonably just tried to minimise the time spent on it.
>Number of unit tests isn't the best proxy for that goal…

However, I don’t think mindlessly creating tests is acting in good faith. It’s bordering on malicious compliance. I doubt they were thinking they can just knock out that metric so they can otherwise create better test coverage. (The OP conceded their coverage wasn’t good). Better employees would work to create a better understanding/goal. All of that points to some cultural problems.

Would adult developers not have made sure they had enough automated tests?
It probably depends whether they had been given specific instructions that directly conflicted with that.
And it also depends on whether they're given a timeslot for that and/or incentivized for it. Automated tests need times to maintain, especially if they're some changes on flow / specs, tests need to follow.

I scrap all incentivized metrics when working on something urgent (important and soon), which is often the case. If the metrics somehow incentivized, we'll start gaming it.

Now if a dev has part on production support and automated tests are purposed to reduce the workload for support then it has time slots for that, I bet everyone will start doing it.

If a civil engineer is given specific instructions to make an unsafe bridge, that engineer does not obey those instructions. I think adult developers should behave similarly.
The analogy starts to break down. Most software is less mission-critical than the structural integrity of a bridge. Meaning the consequences for failure are probably annoyance rather than risk of death. Not always. I would imagine the software for nuclear reactor or aircraft controls is in a different category, but most software is not bridges.
Okay, but if developers decide that then at what point do you say it's okay to not treat them like adults?
I’ve often wondered what is the solution to Goodhart’s law. Obviously a business can’t abandon metrics. Perhaps this is where qualitative management skills come into play - humans in the loop making good decisions instead of blindly marching to the output of an automated reporting system.
I'd argue that the point Goodhart's law isn't that metrics are bad, but that all metrics aren't created equal. In the case above, it seems like the real goal was improved code quality. Number of unit tests isn't the best proxy for that goal, so it wasn't the best choice of metric. You don't want developers creating tests for the sake of creating tests. (There's some irony here in that it's not the type of behavior I'd ascribe to professionals). The "solution" is a metric that's a better measure of what you actually want.
Even professionals have limits. I've worked in companies with the kind of management which kept adding this bad proxy metrics and pushing initiatives which had a totally expectable bad effects on the product quality. Most devs used to fight the management on this, but grew progressively tired of this continuous fight. At some point the experienced devs either left or just gave up and started giving the management what they asked. Us juniors followed suit. The management was happy, the actual workload diminished because we let go of "low priority" tasks and we even go a juicy bonus at the end of the year because of how good we were doing.

The company tanked six months after that, now it doesn't exist anymore.

There's only so much you can do when the management is hellbent on doing stupid things.

You might be misconstruing the point. I certainly wasn’t insinuating more and more metrics. If anything, it’s the opposite: a core understanding of what’s really important helps you focus on the few metrics that matter.

In that context, I’m not really sure what point you’re making, unless it’s just to share a personal anecdote. Are you implying that management shouldn’t have any quantitative measures and should only be qualitative?

You need good quantitative measures, not just random numbers.

If you sell, say, water bottles, you probably want to know how many of them you can sell at any given moment, in order to not overbook and have to reimburse people. In this case, keeping track of how many water bottles you do have in stock probably helps, keeping track of how many labels with funny jokes you can stick on a shipping box in an hour doesn't. But if you start tracking the latter and handing down bonuses and layoffs based on it, people will max that metrics out - at the expense of your actual stock capacity.

Quantitative measures are dangerous, especially in the hands of people who believe they are better than qualitative ones because they're "objective" or whatever. Because not only they aren't, but they are also better than qualitative ones at hiding their biases and soothing your own.

> Are you implying that management shouldn’t have any quantitative measures and should only be qualitative?

Many managers would do a lot better this way. They'd still make stuff up, but would at least be forced to admit it.

Goodhart's law is always in effect. It can't be solved because it's not a problem, it's a fact of nature with annoying implications. It's the echo of the observation that efficiency is fitness as its consequences ripple from the lowest levels of reality through systems made of people.

You can make an engine as effective as the laws of physics let you, but you can't solve the limits of thermodynamics. You can only do your best within them. Same with this, for maybe the same reason.

In my experience, there is no good solution long term except changing the metrics themselves every so often. When new metrics come in, as long as they are not totally boneheaded, they improve things. Then people start learn to start gaming them and some of the more sociopathic folks start doing so, then others copy by example and soon enough the metric is useless at best or detrimental at worst and it is time to move to a new metric.

It helps to have someone with a hacker mindset think about the metric being designed so the obvious ways in which it could be games are taken care of and their own metrics/incentives are aligned with the company goal.

If you didn't have enough tests you weren't acting like adults. Why should he treat you like adults?
Maybe you are jumping to a conclusion too quickly.

How do you know what was really going on?

"He created a requirement that every developer commit two new tests to the code base every workday" seem a stupid requirement if you do not control the quality of the tests.

The same big corporation I wrote above had a goal of 80% code coverage reported on dashboards.

I saw people writing tests just to run lines of code, without effectively testing anything.

Others people were "smarter" and completely excluded folders and modules in the codebase with low coverage from coverage testing.

Code coverage percentage numbers on a dashboard are a risky business. They can give you a false sense of confidence. Because you can have 100% code coverage and be plagued by multitude of bugs if you do not test what the code is supposed to do.

Code coverage helps to see where you have untested code and if it is very low (ex: less 50%) tells you that you need more tests. An high code coverage percentage is desirable but should not be a target.

The real problem is again the culture.

A culture where it is ok to have critical parts of the code not being tested. A large part of the solution here is helping people to understand the consequences of low code coverage. For example collecting experiences and during retrospectives point out where tests saved the day or how a test may have saved the day so people can see how test may save them a lot of frustration.

But again, when you give people a target and it is the only thing they care about, people find a way to hit it.

He said it himself.

> When I worked at a FAANG ~15 years ago, a new VP came in and heard, correctly, that our group didn't have enough automated tests.

There are a thousand reasons why reasonable people might have found themselves in that position. Maybe they inherited a code base after an acquisition or from some outside consultancy who didn't do a great job. Maybe management made a rational business decision to ship something that would make enough money to keep the company going and knowingly took on the tech debt that they would then have some chance of fixing before the company failed. Maybe it actually had very high numbers from coverage tools but then someone realised that a relatively complex part of the code still wasn't being tested very thoroughly.

If a team has identified a weakness in testing and transparently reported it, presumably with the intention of making it better, then why would we assume that setting arbitrary targets based on some metric with no direct connection to the real problem would help them do that?

had to mention https://fs.blog/chestertons-fence/

if team does not have automated test, but still manages to deliver working software - maybe tests are not adding as much value as VP thinks?

the most important is feature delivery, and integration test, not automated unit test where you test getters and setters with mock dependencies - absolutely useless busywork

Tests aren't exclusively about asserting current behavior -- they also help you determine drift over time and explicitly mention the implicit invariants that people are assuming.
Chesterson's fence isn't saying that the fence/test isn't necessary. It's saying you need to take the time to understand the broader context rather than take a knee-jerk assumption. To be more clear, just because developers don't see the need for better testing, doesn't mean more testing isn't needed. But it may indicate the VP didn't doing a good job of relating why, which leads to the gamesmanship shown in the story.

Schedule isn't always the most important thing either. It's possible delivery the software may just mean you've been rolling the dice and getting lucky. The Boeing 737MAX scenario gives a concrete example of where delivery was paramount. It's a cognitive bias to assume that "since nothing bad has happened yet, it must mean it's good practice"

This might be relevant if the original comment didn't say "correctly".

Also "not testing a lot" is not a chesterton's fence. "not testing a lot" can't be load-bearing.

Without tests and instrumentation you won't even know if it's not working.
> The real problem is again the culture.

The culture lead to fake tests instead of adding tests that were legitimately lacking.

Is that so different from saying they weren't acting like adults?

The phrasing could be called dismissive but I give that a pass because it was mimicking the phrasing from the post it replied to. The underlying sentiment doesn't seem wrong to me.

> Code coverage percentage numbers on a dashboard are a risky business. They can give you a false sense of confidence. Because you can have 100% code coverage and be plagued by multitude of bugs if you do not test what the code is supposed to do.

Or, conversly, I've been in charge by really awkwardly testable code.. which ends up being really reliable. Plugin loading, config loading (this was before you spring boot'ed everything in java). We had almost no tests in that context, because testing the edge cases there would've been a real mess.

But at the same time, if we messed up, no dev environment and no test environment would work anymore at all, and we would know. Very quickly. From a lot of sides, with a lot of anger in there. So we were fine.

Need some tests for those tests!
Insufficient test coverage doesn't necessarily mean lack of self-discipline. It can also stem from project management issues (too much focus on features/too little time given for test writing).
An adult would have started to write the tests themselves so they'd understand what was going on around them. You don't just frown at people and hope for the best.
Yep, I worked at a place that was focused on everyone completing 100% of their jira tickets each sprint, just to get the metrics up. You didn't have to actually finish, it just had to look like you did to the bean counters.

If end of sprint came and you weren't done, the manager would close out the ticket, then reopen another similar one named "Module phase 2" or something similar for next sprint. One guy was an expert at gaming the system, and his ticket got closed and opened anew for about 3 or 4 sprints.

> Learnings? B and C players with questionable ethics screw companies quite rapidly.

No one should be surprised when employees respond to incentives, and blaming them seems a clear indicator of managerial failure: failure to tend to morale, failure to reward actually useful behavior, failure to articulate a vision.

> Without an outstanding company culture, most KPIs are almost useless.

Also, with an outstanding company culture, KPIs aren't really necessary.

So, when would they be useful?

I'm not as negative on KPIs as the previous line suggests though. They can be useful to shape direction when used carefully. But don't make them too long-lived, discard and create new ones as soon as they become gameable.

Fire those folks and move on. If you're a subordinate and your leaders are not firing those folks, quit and move on.

Gameable KPIs offer windows into the souls of your colleagues.