I'm one of the co-authors of the study. Your critique is valid though by research standards, for this type of study, our sample is sufficient. We are planning to replicate this study on a larger scale in the future, though!
Are there any plans to figure out objective ways to measure productivity and what distinguishes “good devex” from “bad devex”?
I’ve worked at a lot of big tech companies that do surveys about internal tooling and every year it’s rated as a weak spot, across years and companies this seemed like a consistent trend.
And yet everyone had teams dedicated to improving various aspects of devex so it’s unclear if these teams are just improving the wrong things or if productivity really is improving and it’s something else (eg the amount of code debt grows faster than devex improvements or people are asked to go faster than the devex improvements can keep up or the devex is being improved but the size of the survey means not enough people feel it because you optimize smaller subsets of engineering orgs).
That’s another thing to be mindful about large scale and small scale surveys - the latter might be sampling specific teams adopting the tool whereas the former might find there’s no way to make everyone happy and it all turns into a wash.
"Are there any plans to figure out objective ways to measure productivity"
You can't measure developer productivity objectively, assuming you're referring to metrics like lines of code, number of pull requests, or velocity points which are infamous. There's broad agreement on this both within the research community as well as practitioners at leading tech companies.
> You can't measure developer productivity objectively, assuming you're referring to metrics like lines of code, number of pull requests, or velocity points which are infamous. There's broad agreement on this both within the research community as well as practitioners at leading tech companies.
No those are metrics you’re suggesting. There are better ones as someone else mentioned (time to get code from PR to production, some way of measuring the quality of work getting to production, etc). Yes, the obvious metrics are poor and better metrics are difficult to measure and quantify. And obviously no single metric is going to capture something as multidimensional as code development.
Also, the link you reference doesn’t support your argument.
> Across the board, all companies shared that they use both qualitative and quantitative measures
Throughout it discusses that people do use quantitative metrics to help guide their analysis and none of them try to do the obviously naive ones as mentioned.
This isn’t intended as a critique, but as an engineering profession anything that isn’t quantifiable means it’s open to interpretation and argument weakening forward progress to be restricted to what becomes adopted as industry standard which is in many ways more of a popularity contest of fads rather than concrete technical improvement.
Asking people whether the developer experience is good or bad is not going to be the most efficient of approaches: It's ultimately asking for a mood. When teams are asked what they are spending a lot of time on than they wish they shouldn't, you can at least see what are the heavier pain points. It doesn't help if your developer experience budget is zero, but it can at least organize the useful alternatives.
In most places I've worked at, the a survey asking for specific pain points gets great results, because the worst time sinks stick out like a sore thumb, especially if you have workers that have worked in high quality organizations.
Those places did detailed surveys but the results were uninteresting; it’s always something abstract like long compile times, long CI times, etc. then the next year the company presents all the concrete ways they made things faster and the same themes repeat.
There are three problems. The first is that people who can accurately point out the problem are outweighed by a bunch of people who are just unhappy and generate a response for the sake of participation / being prompted. This means that you can address the pain point you think you’ve identified only for sentiment to remain unchanged so you try to tackle the next point and the cycle repeats.
The second is that things like “slow compile times” may have hundreds of different reasons so if you improve compile times in aggregate by 20% you’ve not solved anyone’s specific day to day compile time pain point where they normally spend many 10s compiling which sees a reduction to 8s but the expensive compile which runs infrequently (both because less needed or because it’s so slow) takes 1-4 hours. It’ll maybe see a reduction into the 48min-3.2 hour range which is substantial but not enough to be felt or it may be unaffected because the improvements aren’t measuring that slow build and it’s not a focus (eg maybe it has a bad dependency chain that pulls in way too much). The causes of why that’s slow can be hard to tease out correctly and engineers are incentivized to make the “biggest bang for the buck” changes that sound impressive and quantifiable (20% reduction in compile times across the company vs I made this team happier and it’ll maybe show up in an end of year survey if I’m lucky)
The third is that the rate at which certain kinds of things get worse (long CI and compile times) keeps up or usually outperforms the pace at which things get better (eg 1000 developers adding to compile and test times cannot be beat by a team of 10 engineers spending their time on speeding things up).
Having been on the teams that improve developer experience, the problem is that one of my hands gives while the other takes away. I can address every complaint a developer has about the company platform, but at the same time requirements change. As a company grows they start caring more about security and firewalling between different data and services which makes developing harder and more annoying.
For your first question about good devex, there are definitely some objective ways to measure it.
* The time it takes for completed code to be deployed to production
* count of manual interventions it takes to get the code deployed
* count of other people that have to get involved for a given deploy
* how long it takes a new employee to set up their dev environment, count of manual steps involved
Having done internal developer & analyst tooling work (and used DX), this type of survey is great for internal prioritization when you have dedicated capacity for improvements.
I'd be curious to see more about organizational outcomes, as this is piece of DevOps/DevEx data that I feel is weakest and requires the most faith. DORA did some research here, but it's still not always enough to convince leadership.
For the uninitiated among us - can you share more context on the research standards and the reasoning behind it? I'm interested and would like this to influence some decisions I have but would like to understand the confidence here :).
I’ve worked at a lot of big tech companies that do surveys about internal tooling and every year it’s rated as a weak spot, across years and companies this seemed like a consistent trend.
And yet everyone had teams dedicated to improving various aspects of devex so it’s unclear if these teams are just improving the wrong things or if productivity really is improving and it’s something else (eg the amount of code debt grows faster than devex improvements or people are asked to go faster than the devex improvements can keep up or the devex is being improved but the size of the survey means not enough people feel it because you optimize smaller subsets of engineering orgs).
That’s another thing to be mindful about large scale and small scale surveys - the latter might be sampling specific teams adopting the tool whereas the former might find there’s no way to make everyone happy and it all turns into a wash.