Hacker News new | ask | show | jobs
by lacker 1574 days ago
I don't agree with most of the advice in this article but rather than complain let me suggest an alternative.

As a line manager, with software engineers reporting directly to you, you should be able to use your personal judgment to understand the productivity of your software engineers. Don't measure it with acronyms, with metrics like the number of commits, or by paying attention to how many hours a week people are working. Pay attention to whether people get things done, and are they getting big important things done, or only little nice-but-not-critical things. Make sure you communicate enough so that individual software engineers understand how you think and what you prioritize.

As a manager-of-managers, it is going to be very difficult for you to measure developer productivity. It's tempting to look at metrics like the number of code reviews a developer does. But these can at most be a sanity check, not the core metric to go for.

Instead, you can measure productivity of teams. Is the team getting things done, and are they big important things, or only little nice-but-not-critical things? Sometimes, a line manager will insist that everyone on their team is performing excellently, and yet you observe the team overall is not achieving very much. Probably one of the two of you is incorrect, and you should dig in to figure that out. The opposite also happens, where a manager states that everything is a disaster, but you observe that the team has actually delivered a lot.

The other thing you can do is to teach your line managers how to judge individual productivity. There's no silver bullet, it's just a natural outcome of having conversations about who is productive and who is not and how to tell and what to do about it, so be sure to have enough of those conversations.

None of this is easy to quantify, but the hard truth is, there is no natural mapping from numbers to developer productivity and it is usually a bad idea to try to quantify productivity. You are much better off using human language and intelligent thinking to evaluate productivity, rather than reductionist metrics.

6 comments

I too have come to think that no simple metrics will ever replace the need for a competent manager who can use intangible, subjective context to evaluate their team. I think that even if you get some metrics that work well initially, the system will change such that the metric becomes the goal and the metrics then become much less effective.
You still have a goal that people are optimizing for though--how to increase that intangible, subjective measure known only to the manager. It leads to people optimizing for talking about the stuff they do more, showcasing their accomplishments, and in bad extremes brown-nosing, infighting and sabotage of others work. Whether any of those behaviors lead to a better product or outcome for the company is something to strongly consider.
Well put. I couldn't agree more.

Humans and relationships are nuanced, including work relationships and the responsibilities and expectations there. It's best to treat them as they are rather than trying to shoehorn those things into such a sweet little checkbox.

Frameworks are alright, but they need flexibility built in. They certainly shouldn't be treated as religiously as they are commonly.

I can't agree more with you. I tried to sum my thoughts in my first reference

> One of the most common myths — and potentially most threatening to developer happiness — is the notion that productivity is all about developer activity, things like lines of code or number of commits. More activity can appear for various reasons: working longer hours may signal developers having to "brute-force" work to overcome bad systems or poor planning to meet a predefined release schedule.

The SPACE framework is not about measuring quantitative data only. I feel the need to explain how certain metrics might be interesting, but rather to identify key issues or unexpected events during engineering sprints. Without data analysis, you would not be able to understand why there is a drop of productivity during certain periods, and usually, those drops were created by the management (too many meetings or lack of follow-up)

I don't see how you can make the leap from "its hard to measure" to "no metrics are useful". As with anything you have to use your judgement, experience, and its a case-by-case thing. Everything is a signal. Lines of code, number of bugs fixed, number of bugs found, severity of bugs, hours in office, meeting project milestones, contribution in team meetings, etc, etc. Its up to you how to interpret each signal. As you rightfully said, there is no silver bullet.
There's an entire class of products I'll name "internal platform tools" whose primary objective is to improve the developer experience with the intent of having the side effect of increased developer productivity by making it easier & more enjoyable to build things within a company. The teams working on these tools need to understand how their products perform the same as a team building some widget for a "paying" customer.

Without some quantifiable metric, how do these teams know if their products are getting better or worse? The discussion always goes to measuring developer happiness & developer productivity because we want with some degree of confidence to be improving or at least maintaining these metrics.

Inefficiencies in the developer experience show up as frustrations for developers. Developers are very happy to tell you what frustrates them and how badly.

Often what frustrates people is a latency, which is something you can measure and track. Other times it is an ugliness, surprising footgun, or lack of conceptual integrity - these are fundamentally human experiences, and subjective assessment is the only way.

Agreed, we need both quantifiable metrics, and also a human brain to interpret them with subjectivity, context and compassion.

I see many people wanting to take writing code into the liberal arts domain, but I am of the opinion that it may be more useful if we can overlap it with the engineering domain. IMHO the goal should be to repeatedly churn out high-quality bug-free code, and to create an objective process methodology so that time and money is well spent. We may end up with multiple different methodologies for various technologies, domains, etc.

The part where the code needs to execute correctly is engineering domain. But lots of bad code executes correctly. "Programs must to be written for people to read, only incidentally for machines to execute." Writing things for people to read is unavoidably an arts discipline.
>Writing things for people to read is unavoidably an arts discipline.

I understand what you mean, but I'd have to disagree with that. I've been working on a very large engineering project with a large-ish team (~50 members) for over two years. All of our communication is via an established methodology of engineering diagrams, design documents, position papers, etc all of which are in a structured format that follows common rules/regulations/conventions.

As a result, all the companies we work with for e.g. understand our P&ID diagrams, electrical schematics, mechanical design docs, system layout diagrams, etc. There is no reason why such a methodology can't be brought to code. I don't view code as anything special - having now worked on on both sides. I think there really is a lot of value in the engineering methodologies that can be adapted and applied to the software world.

Do your diagrams, design docs, and position papers not vary in the clarity of their presentation or in the wisdom/simplicity/fitness-for-purpose of the ideas they convey?
Indeed, they vary. The larger point is we still get a lot accomplished/communicated and done because of a common underlying methodology. I don't have an answer for what that means when adapted to the software field. Its going to be a soup of many things - coding guidelines, BDD, TDD, modular programming, etc, etc. I'm sure there are brains far bigger than mine already working on this, its not really an original idea in that sense.
Generally you find the problems in your developer experience, and use those as your metrics. Maybe it takes three PRs and an hour and fifteen minutes to deploy to prod, lower number of PRs and minutes to deploy would be your metrics.

Or maybe to introduce a new endpoint in your API takes X amount of boilerplate lines, Y files, etc and you post mortem new endpoints after your change to ensure that number is dropping.

Talk to people, find out what their problems are, quantify the problem, measure.

We just need to be careful that this unquantifiable, subjective rating doesn't include biases.
Everything including "objective" metrics includes bias. And that's before you take into account people outright gaming metrics (objective or subjective).
As a simplified example, if I write 1000 lines of code and you write 1000 lines of code. We should have the same rating if that's the metric used. There shouldn't be any bias there. It only introduces bias when the manager feel your code is better than mine, etc.

Now the objective measure itself might have some sort of bias, but at least the rules are set and you're not getting screwed over based on someone's feelings. You can argue metrics, you can't argue your managers feelings.

The thing is that those metrics are very poor metrics that don't correlate well to the "true ideal performance", even if compared to a subjective manager's intuition with all the randomness and biases.

Replacing a subjective metric that's at least somewhat effective with a metric that's totally useless (because those inherent inaccuracies/biases are even worse than even a poor manager's judgement), that's throwing out the baby with the bathwater. The primary purpose of a performance metric is to measure performance, and being prejudice-resistant is something that's nice to have - the primary reason why you implement a metric is not because you need something that can be argued.

But that is proper. The quality of craft/creative work matters. It belongs in the evaluation of craftsmen and creative workers. And it is fundamentally a feeling. When you are junior you may not yet have developed this judgement or taste. Your job is to learn it, from your own failures and the feedback of your senior colleagues. When you are senior, you have it. You are more valuable to an organization precisely because you can be trusted to have positive feelings about good work and negative feelings about bad work, and therefore do the right thing in a position of decision-making power. Also because you enculturate the next generation of senior craftsmen through your feedback.

This shouldn't be surprise at performance review time, nor should it necessarily come from your manager -- it should be coming from your senior colleagues on each of your code reviews, giving you a chance to improve your bad code before it gets checked in. But when your senior colleagues think your PRs are worse on average than those of your peers, then yes absolutely you should get a worse rating.

Your comment doesn't change if you replace lines of code with manager's perception of you. If you're both equally liked by your manager then you should receive the same rating. Within the metric being defined neither is biased since they have clear and explicit definitions. Against the true metric of "productive engineer" both are biased.
And how do you handle your manager having a cultural or unconscious bias against your <race / religion / body type / gender / appearance / clothing / hair color / fragrance of the soap you use / eyewear / etc.>? You just live with them not liking you and not measuring up to others in their mind?
Subjective evaluation for performance purposes is often done by committee for this reason. Your work is read by several people who are unlikely to have the same idiosyncratic biases, at least some of whom don't know you. (That cuts both ways, though; they also don't know the context for the work).
Except you have no measure or target for the manager's feelings.

You have to define productive engineer in order to claim the metrics are biased.

If your 1000 lines of code generate four new bug tickets and mine doesn't generate any, is that biased to say mine is better?

Or how about even if yours generates 10 comments on the PR correcting things to match code quality guidelines and mine doesn't?

I don't think we often track things like that.

In order for these metrics to have even a tiny tiny chance of not being completely gamed (even unintentionally) you'd have to define a rigorous formula of weighted metrics that take things like one of my siblings mentioned into account (did your 1000 lines create a regression or 5 and mine didn't? Code quality? Lots of review comments that took forever to debate and resolve?). And that's assuming you could actually measure those things properly. Was that comment a valid one regarding you missing quality guidelines or was it someone trying to game your metrics negatively so that he'd look better?

I think it's impossible to create something like that and it'd be very very bureaucratic and still prone to gaming. I think having something 'in between' is the best approach. You still allow a manager to interpret these things together with you but the manager should give you a guideline for what to look out for. We can use these metrics to inform decisions about performance but it's completely counter productive to simply have a few metrics where you have to hit specific numbers.

PR throughput? No problem, I'll form a clique of a bunch of people that OK each others tiny PRs. This will result in so much overhead that we won't actually get much done, piss off other team members, create a hell of basically unusable commits, make it more likely that code quality suffers because nobody has any chance of having an overview of what you're doing overall and you will likely create regressions that developed over multiple commits and would've been caught otherwise because let's face it, each unit test you write is its own PR. You say obviously you won't get through with this because your manager is supposed to stop that? Well he can't if we just want a completely objective and metrics driven approach!

With the hybrid approach, you know from your manager that PR throughput is important not at the expense of quality and other things. You want small PRs for certain reasons but not at all costs. There is no exact formula because no two situations are exactly the same. Of course bias comes, of course bad managers make this bad. So does a completely "objective" metrics driven environment in which you play the metrics game. There is no perfect solution.

Exactly, I don't know what the answer is (probably a combo of subjective and objective measures, along with a healthy dose of independent oversight for both) but relying completely on 'gut feeling' is an express train to unconscious bias land.

In reality I think someone who only looks at team members who are "doing the most" is really just measuring who is talking about their work the most. You need some kind of objective measures like features shipped, assigned bugs resolved, etc.

I don't think adding metrics is actually a good approach to reduce bias. Any kind of measurement can be twisted if you want to.

Also, the metrics you choose will undoubtedly contain a measure of your bias anyways.

For instance, the metrics two people would choose to represent developer effectiveness will not be the same, and those differences will reveal what kinds of workers they prefer.

I prefer some kind of metric because I'm tired of screwed over by blind shitty managers.
I hear you. But metrics aren't going to save you. You need to find a manager who isn't shitty! They are out there, don't lose hope.
There aren't many though and almost none who haven't come up through the trenches themselves.
Absolutely agree.

I think there is a conflict of interest there, though. Managers have a vested interest in saying that their team is highly productive. Managers of highly productive teams get raises and more head count, and eventually promotions. Anything else reflects poorly on the manager.

So the managers-of-managers do need to keep their eyes on this too, but I also agree with you that it's harder for people in that higher-level position to evaluate this. I guess, as you hint at, the manager-of-managers can look at team output overall, and if that's below expectations, that's a starting point for discussion with the line manager.

Measuring team output seems just as difficult as for an individual.

I was reading an interesting piece in the economist today about maintaining peoples performance on a long space flight. One part that stood out to me is people being productive makes them happy. I always assumed the causation would be the opposite way round. Perhaps the best a manager of managers can do is try and figure out if the members of each team are happy or not.