Hacker News new | ask | show | jobs
by koliber 918 days ago
OKRs are a goal setting framework for non-trivial 3-12 month goals.

Some people are self motivated and well organized, are great at communicating progress proactively to other stakeholders, an understand the idea of cross departmental alignment. OKRs will not help them.

For everyone else, OKRs are a tool that can help accomplish those things.

PS I actually like OKRs and after a lot of effort, learned how to make them useful. I did not get it at first either.

1 comments

The problem for me is the “key results” part of OKR’s. It means that if it’s not measurable, it’s not an OKR, and in too many organizations, if it’s not an OKR it’s not worth doing.

Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on, at least not without pulling numbers out of your ass. But everyone agrees that you can’t really have quality software without doing this. But OKR’s would say that making nothing but tactical changes to ship features and never revisiting architecture is perfectly great. The incentives always seem to push you towards tech debt.

OKR proponents would say that revisiting architecture and paying down tech debt should be implicit and part of the process of achieving your results, but I’ve never seen it done. Or rather the only time I have seen it done is when someone tries to shoehorn the refactoring work into an OKR in order to justify it, making up bogus metrics, getting the OKR dropped because it’s not meaningful enough, and then just working on it anyway.

"Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on" - why not? generally the need to refactor is driven by something - getting too hard to make changes? productivity down? introducing more errors then we used to? In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.
> In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.

Right, but please read the first sentence of my post. It’s the “key results” part that’s hard. Because you need to quantify all the benefits you’re targeting, giving them a number, so that you can show whether you completed your goal or not. If you say “productivity will go up”, you have to put a number on the current productivity, then give regular reports on what happens to that number after the refactoring. What do you pick? Number of PR’s merged per day? That’s probably going to go down, because most of the PR’s today are small tactical band-aid fixes. So do you say the number of PR’s merged should go down after the refactoring? That could just as easily be because the refactoring made things worse and everything’s so broken that nobody can make changes. So PR’s merged is a shitty metric. What else? Bugs filed? In a product where you’re growing users you’d probably expect them to go up due to increased usage, so any benefit to refactoring is likely going to be lost in the noise. Line of code count? Please.

More often than not people just pull whatever metric they want out of their ass to make the case for what they’re trying to accomplish, and cherry pick things so that it looks better after the effort. But it’s against the spirit of OKR’s to do this, which is why OKR’s are bad for anything “fuzzy” like refactoring. You have to shoe-horn work that everyone agrees is worth doing, into a framework that isn’t designed for refactoring work, to make the case.

You are hitting on an important and difficult aspect of OKRs. Getting the alignment between what you can affect (the leading indicator) and what the outcome is (business value).

It's not an exact science. You can make pro and con arguments against different things that could conceivably be measured. This is where experience and strategic thinking help.

You can always come up with a risk or a reason why a particular measurement won't affect the desired outcome. You will be more right on some and less right on others. However, throwing out the entire OKR approach because you can not be sure is not correct either.

If refactoring code doesn't lead to fewer defects down the road, or to faster feature implementation with less errors, or faster employee onboarding, or any other visible result, then maybe management doesn't want you to do it, and maybe it's reasonable to consider why.
It does all of those things.

But it’s really really really hard to quantify it in a measurable way. Which is what OKR’s force you to think about: what is the metric, what is its current value, and what is your goal for the metric, so you know whether you achieved it?

Can you quantify “faster employee onboarding”, reliably? Can you graph it over time? Can you quantify “faster feature implementation” in a way you can actually measure that isn’t sensitive to the fact that all features are different?

The key results are not a problem just for you! You are hitting the nail on the head. Check out section 6 of this study ( https://arxiv.org/pdf/2311.00236.pdf ) - defining good OKRs is problem number 1, and data issues are the 2nd most cited concern!

I wrote a piece about the common issues that people face creating OKRs. There are a few common mistakes that people make which makes key results unmeasurable: https://koliber.com/articles/top-okr-mistakes

> the gradual accumulation of small fixes/hacks is not something you can put a metric on

I've done it before. On one team, we had a goal to reduce the number of linting errors and warnings from 18,000+ by 50% (while not growing the number of INGORES). The team was reluctant at first, because "it's only linting and it does not matter." But they relented and eventually started fixing things here and there. And the number started going down, albeit slowly. And over time we got the number of linting errors down to 18 (or something close), because people found time here and there to improve things. And the team learned how to use OKRs. And they put in place a style guide and an auto-linter. And they started using it so that the errors did not come back. And there were plans in place to put in more sophisticated style analysis and run another OKR agains that.

They literally matured in the code development practices way beyond just linting, just becuase of the relentless drive on one seemingly insignificant OKR.

This is just one example. You can use OKRs with engineering metrics to improve lots of things:

- fix the top 10 Jira tickets tagged with #techdebt

- reduce linting errors by 20%

- reduce number of functions with a cyclometric complexity of 10+ by 50%

- research 5 static code analysis tools

- increase unit test code coverage from 56% to 62%

You can go many different ways. I've helped engineering teams do this well, starting with deciding what makes sense to improve and getting buy-in, through defining the OKRs, building the system of measuring it, and most importantly, driving the OKR every week.

In the case you cited, with a bunch of hacks, I'd approach it like this:

- Create a OKRs like "Reduce tech debt".

- One of the key results would be "Identify 50 hacky places in code, and create Jira tickets for them" or something similar, by Jan 31st."

- 2nd OKR would be "Refactor XX out of the 50 hacky places identified by Jira tickets, by March 31st"

Pick numbers that work for you.

Your advice could be generalized into:

- Take whatever it is you want to do and break it down into N jira tickets

- Make an OKR saying “solve these N jira tickets by date X”, with the result indicator being “number of those particular jira tickets solved”

- At the end, your OKR percentage is some fraction of N

This works regardless of what the thing you’re trying to do is. It goes against the spirit of OKR’s which is to use metrics that matter to the business (number of users onboarded, page load time, conversion percentage, etc) to justify work. That’s what the “results” in OKR’s are supposed to mean.

Correct. Breaking something into N tickets is one way of approaching OKRs.

It does not go against the spirit of OKRs. Reducing tech debt and making a metric out of the number if Jira tickets can work, and is a workable approach if there is business value from reducing tech debt. If you can align it to "reduce page load time", why would you not use it? Don't conflate business value with how you measure things. OKRs should align to business value. OKRs should be measurable. You can have things aligned to business value that are harder to measure. You can have measurable things which provide little business value.

There is no rule that says that you can not measure the number of tasks that get completed as part of an OKR. It's true that the smaller N gets the less sense it makes, and that N=1 is a binary goal. OKRs are better for larger N numbers, as those show progress better. Going from memory, "Measure What Matters", the OKR bible, has examples of OKRs where the goal

Nothing is stopping you from using OKRs for small N. But I have seen people come up with all sorts of excuses why "it won't work" so your milage may vary. My suggestion is always "try it fullheartedly before you knock it."

The generalization won't work once N is large, or is continuous, or does not make sense as separate Jira tickets. Luckily, it does not need to and you can track such metrics without the help of a ticketing system.

Examples that won't work as Jira tickets but can be good OKRs, if they align to a business goal:

- improve the Core Web Vitals cumulative layout shift (CLS) by 0.3 points. (can align to "reduce bounce rate" as CLS affects the perceived load time and quality)

- increase test coverage by 15% (can align to "reduce churn", if churn is caused by poor product quality, and test coverage can improve quality)

Whats the benefit of OKRs as a system in your example? You're essentially just creating a list of to-dos to check, no?
Key results can be more granular and less granular. Think of it as a continuum. Some are continuous:

- Improve the time to first byte for the homepage by 500ms.

This one is continuous, because time is continuous. Realistically it will be quantized into milliseconds, but that's nitpicking.

You can get less granular:

- solve 500 linting errors

This one is kind of continuous, but there are 500 distinct steps. Each week it is feasible that you can solve a handful, and can see movement and improvement.

- add 5 unit tests to XYZ module

Now we are getting much less granular. It is a checklist of 5 TODOs. But you can track the progress. Its unlikely each week you will make an improvement, but on some weeks you need to if you want to hit "5" by the end of the quarter.

- hire a new DevOps engineer

This is a binary, checkbox, or hit-or-miss key result. Sometimes they makes sense. It's not great if they make up the majority of the key results on an OKR. The good news is that you can make it more granular. Create a plan for hiring an engineer. Break out the steps, assign a percentage to each step, and track it as a 0-100% key result. This way, as you write a job description, post it, create a pipeline, review resumes, and hold interviews, you can track and share the progress.

Todos are binary, OKRs are (supposed to be) measurable continuously. Ie, we only did 5 of the top 10 tech debt tasks so the goal was 50% achieved