Hacker News new | ask | show | jobs
by luiscleto 2637 days ago
Reminds me of Goodheart's law[1].

We have known this for a long time, but it is hard to have an alternative system which scales well with huge organizations. For startups and small companies, I could see an informal system working pretty well, but as the company grows to hundreds or thousands of employees, it becomes necessary to standardize and have some kind of metrics used for reporting and evaluations. This will inevitably shift the company's culture towards gaming those metrics.

Somewhat like grading systems in education. High grades don't necessarily mean you will be capable of generating more value to society than average grades or even low grades. And students often become good at improving their grades without that actually adding much value. But there is a correlation. And we don't have many better (non-experimental) alternatives that I'm aware of.

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law

3 comments

> This will inevitably shift the company's culture towards gaming those metrics.

Yes, but that's only the beginning of the story. People gaming metrics is a type of security problem, in that "attackers" try to game the metrics while "defenders" try to make them less game-able by improving the accuracy and precision of how the metrics are gathered so that the final numbers continue to tell a valuable story over time.

The issue isn't that metrics can be gamed; it's that organizations which pride themselves on being data-driven rarely make the investment in hiring blue teams and red teams to defend and attack the metrics. If you appreciate that investing in cyberdefense is key to protecting your company from cybersecurity threats, why can't you appreciate that investing in "metricsecurity" is key to protecting your company from "metricsecurity" threats?

> The issue isn't that metrics can be gamed

The issue is that many of the most important parts of many organisational activities can't be easily measured through simple metrics at all.

I think this is a rubbish excuse made by people who don't understand the role of such a "blue team". If the organization initially adopts metrics that are counter-productive (e.g. measuring feature completion and not technical debt), it is the role of the blue team to change the metrics such that the neglected areas are properly accounted for in final performance metrics. No metrics should be final; only iteratively tuned to achieve results that are more and more indicative of the underlying performance.

It is difficult, but still possible, to measure technical debt and other "hard" metrics. It is precisely the job of the blue team to deal with that.

Just because you've made a team and given them the job of doing something (incredibly) difficult, doesn't mean you've actually solved that problem or even should expect them to solve it most of the time.

You're absolutely right that having a "blue team" is much better than not having one - but it doesn't mean that calling out the reality that many organizational activities can't be easily measured is a "rubbish excuse made by people who don't understand".

And how do you measure the person who takes time out of their day to help a colleague in another department, who is having a tough time understanding an issue, so the person takes 15 minutes out to help them, boosting morale and overall company cohesion?

Or should they be penalised for wasting 15 minutes?

First of all, not every company wishes to incentivize this behavior. Peopleware reminds us that phone calls and other real-time interruptions are big drags on productivity for knowledge workers who need to concentrate. Every time somebody has a verbal conversation to illuminate something unclear, is a time that it wasn't recorded into some kind of documentation that will help future people with the same confusion.

But say you do wish to incentivize that. You can, if you can track the medium of exchange. Take everyone's phone records, reward short conversations, but disincentivize conversations that are too short ("sorry, not now, bye") or too long (social chatting in place of productivity). If you can cost-effectively put it through some kind of ML classifier that could tell you if the conversations were helpful internal support, personal, etc., then all the better. Translate that into some kind of score and factor it into whatever formula that produces personal KPIs.

Not saying it's easy. Just saying it's possible, and it's realistic if you have a team whose full-time job is to come up with these kinds of solutions.

Sorry, I;'m not doing this on the phone. Bob and I are doing this over a quick coffee break, scribbing on the back of a napkin.
On the other hand if this is a priority and incentivized, what’s to stop it from going too far, where employees would get extra credit for chatting for entertainment
exactly why this is a good example of an organisational activity that can't be easily measured through simple metrics
Depends, do you want that to happen or not? You seem to be making an assumption this is good, but in some organizations this would be a bad thing to penalize. I'm not sure why you would do this in engineering, but it is important to acknowledge that this isn't a universal good and so maybe your company wants to discourage it for some reason.

Assuming you want people to help each other, you need to capture metrics on it. A few years back I had a metric of helping n people in a different department: I kept track of those interactions so I had something to report at the end of the year.

Was that your personal metrics? Metrics created for yourself are subject to less gaming because when you start lying to yourself about those, you will start to wonder why keep those metrics at all.

If that was a company-issued, top-down metric, I hope it wasn't defined literally as "helping n people in a different department", because that has enough wiggle room to sail an aircraft carrier through. The difficulty of creating a good metric here comes from the difficulty of defining what exactly does it mean, in company context, to "help other people" - and also what it explicitly doesn't mean.

How would you capture those metrics? Requiring people to document all such interactions is impractical, and open to easy abuse.
There is a philosophical debate underlying this. Take the analogy of a ML algorithm:

We know many algos are DESIGNED to be a black box, to be unexplainable. Red team iterates an incredibly effective algorithm that produces the desired outputs (with unseen risk built up as well). Blue team, in order to manage risk, is tasked with .. explaining the unexplainable process? An impossible task.

Is it possible that human/organizational processes can be also unexplainable?

What metric do you use to judge the success of the blue team, and what prevents them from gaming that metric?
In practice, you wouldn't really need to have a metric for the blue team, for the same reason you don't need metrics in a 5-person startup. Management is close to the blue team to replace being "close" to thousands of people, and because management is close to the blue team, can judge their output without needing a formal metric.

Maybe if you had an organization that was big enough to require several blue teams (a military or government?), then you'd need a metric for blue teams. Such a metric would probably compare the sub-KPIs of each sub-organization that each blue team was responsible for, including metrics on customer satisfaction, and warrant investigation if the metrics went under.

The blue teams can't really game that metric without the entire organization falling over, and if that happened, the executives would be to blame, not the blue team.

I know a girl who works for United Airlines' "blue team." It's a small (5-8 people?) inward-facing operations consulting group that reports directly to Oscar, the CEO.

It's composed of engineers and they analyze existing processes and create new metrics all day long.

They do worry about their own careers/promotions etc but the group is too small for there really to be any opportunities to "game" anything beyond basic politics.

> In practice, you wouldn't really need to have a metric for the blue team

So the metrics are BS, and the company is really being run by subjective intuition, for which th metrics merely provide an impersonal rationalization.

The problem though is that executives need some set of a few hundred numbers they can use to track the state of a company. For just my job alone I could generate more metrics then that to properly characterize our state and problem space -- but then an exec would need to deal with thousands and thousands of numbers.

Sorta sucks, but that's how it goes. Good execs manage a sufficiently decentralized system, but they still need SOME set of summary numbers.

Good theory, but in practice challenging the wisdom of metrics that really important managers have put into place tends to be a career-limiting move.

The very important managers themselves should care about evaluating the metrics they impose, but typical unspoken manager performance metrics include "episodes of dissent" (addressed by discouraging advice), "displays of weakness" (addressed by threats and aggressive attitude), "time management" (addressed by not thinking matters through), and so on.

>why can't you appreciate that investing in "metricsecurity" is key to protecting your company from "metricsecurity" threats?

Because if I'm a C-level or EVP-level person responsible for this type of decision, why would I want to spend money on a team of people fighting against my ability to get a big fat bonus?

> ...but as the company grows to hundreds or thousands of employees, it becomes necessary to standardize and have some kind of metrics used for reporting and evaluations.

Does it? Is it inconceivable that each part works to its own goals and metrics consistent both with its own values and those of the wider organisation?

Yes. Happens all the time in large companies.

https://www.nytimes.com/2010/02/04/opinion/04brass.html?page...

Article is from 2010

Microsoft for example, was plagued by divisional infighting: ClearType was sabotaged.

> Is it inconceivable that each part works to its own goals and metrics consistent both with its own values and those of the wider organisation?

Beyond a certain number of people, absolutely it is.

You don't address the parent's point.

Trying to group 1000 people together and measure all of them, sure that seems insurmountable for an informal system.

Taking that same group of 1000, splitting them into subgroups of 7 and giving those individual groups their own goals and autonomy to pursue them may again allow for informal performance measurement.

> Beyond a certain number of people, absolutely it is.

that's because of the lack of trust, and the flow of responsibility and control.

If you structure a company such that the employees themselves has to be responsible for their output in such a way that higher output leads to more money for them, you'd not have this problem. For example, contractors who work on a results basis.

I think you’d prove the problem though because everybody would want to work on projects/bugs/teams that have clearly measurable results. I work at a company with clearly(ish) defined “this is what a Level X engineer does” so you can measure yourself against the current and next level for promotion’s sake. I’ve mentioned to my manager that I struggle with authentically choosing projects/tasks, I want to work on things that I see as valuable to me and/or the company but I also want to be promoted and some projects are clearly better promotion material but are not necessarily as valuable... I strive for value but probably overthink the situation :p
> everybody would want to work on projects/bugs/teams that have clearly measurable results.

so let that happen. Those who can't can leave, and see how the company actually fairs. When they find that some crucial roles aren't taken and the company starts "failing", then they will surely admit that said role is needed, and reward it accordingly.

But perhaps there are indeed roles that aren't useful, and therefore, actually could've been eliminated but for the stigma - so may be this is the way to go forward.

> employees themselves has to be responsible for their output in such a way that higher output leads to more money for them

But what metric would you use to measure output that solves the gamification problem?

Even for contractors or sales people (where you could use the sales volume), this could lead them to favor short term results and compromise the long-term health of the company (e.g. by favoring quick, low-quality solutions by contractors, or selling features that don't yet exist and creating unsustainable roadmaps by sales people).

You could reward them in the same way that executives are increasingly remunerated for their performance-based pay - in shares or other financial instruments with an enforced holding period that are linked to the health of the business / business unit as appropriate.
While that's an interesting approach for compensation which might mitigate knowingly bad/irresponsible decisions, it doesn't look like it would address the core issue here of having to choose a metric to base compensation on.

Maybe the gaming effect would be lessened by that compensation approach, but at a very large scale org, I doubt that it would. Although, it would be interesting to see real life studies of this, in case such practices have already been tried out.

Does exec give a damn if they earn 5 or 10 million?

It’s only true if you believe that the average human being is not opportunistic.

>that's because of the lack of trust, and the flow of responsibility and control.

I currently work for a company that has been transitioning from being tiny (I was employee #24) to pretty big (we're close to 60 now). One thing I've learned, much to my disappointment, is that once you get past a certain size it gets harder and harder to recruit people worth trusting with that kind of responsibility. The supply of such people is too limited, and they tend to get poached quickly.

>If you structure a company such that the employees themselves has to be responsible for their output in such a way that higher output leads to more money for them

Oh so all you need to do is fairly and reliably measure "output" in an ungameable way? Easy! /s

Just want to point out 60 is really not all that big, 500 is getting big, 1000 is huge.

If your feeling that at 60, imagine what that must feel like..

50-60 is a transition point for small businesses. That's the point at which it literally becomes impossible to be indifferent to process or structure.

Prior to that size companies can kind of get by on luck, skill, or the "heroic efforts" of individual contributors to carry them along. Once you hit 50 FTE that approach starts to fail more and fail harder. This is why tons of small businesses flame out when they hit this threshold.

500 is just getting into medium size. I'd say you need to approach 10,000 before you can say large. You need tens of millions before I could call you huge (Ie a country, depending on how you define organization might need to be a dictatorship to count for you)

I agree with your point, but I think your scale factors for size are off by orders of magnitude.

Thanks for bringing this up; I was a bit disheartened that an article basically talking about Goodhart's Law doesn't mention it.

EDIT to add: To be fair, the author's book (on which the article is based) mentions the term twice (and twice in the references). Still feels like this is inadequate, though.

https://books.google.com/books?redir_esc=y&id=rgs8DwAAQBAJ&q...