Hacker News new | ask | show | jobs
by gorgoiler 1524 days ago
”My emails produced only well-worded refutations. They explained quite factually why the setup is the way it is, and implicitly therefore why it could not change”

This landed so truly for me, it felt like a punch in the stomach.

I wouldn’t dare count the number of times I’ve been told the technical details of why something is the way it is, without anyone ever saying the reason why we actually wanted it to be this way. My thesis was usually: we don’t.

In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.

Except it’s usually more like: why do we have to spend $9k on a commercial dishwasher repair contract? Because we have a commercial dishwasher … to get the rust off the silverware … because we eat outdoors every night … because the front door was too small to get the dining table in the house.

Somehow, when the real examples of this stuff are clever engineering around build / docker / polyrepo / release / feature flags / third party bugs, the cleverness makes people think the existence of the workaround should be tolerated. It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.

The whole article was fantastic. I hope the author has the engineering leadership role they deserve. We need more people like this.

13 comments

Devil's advocate:

If you spend all your time refactoring, cleaning up after legacy design constraints, fixing ossified errors, then you run out of time and fail to write actual meaningful income generating features. Conversely if you make none of those improvements, eventually the weight of bad architecture slows all progress to a halt and no new income generating features get delivered.

One of the hardest parts about advancing as a developer, in my opinion, is being able to tell when you should refactor versus just leaving the old working mess alone.

With your example, it's like if it turns out that the dining table is embedded into the ground with concrete because it kept blowing over, and moving it indoors would require getting a carpenter to create new legs. And also that because the dining table has been there for so long, someone decided to run electricity cables through it, so rerouting it requires an electrician and will shut down the factory at the bottom of the garden for half a day. We could buy a separate table for indoors, to try and slowly migrate to the new table, but then we'd have two tables to maintain and we all know how that usually goes.

At a certain point, you look at it and go well the commercial dishwasher is just $9k and we can focus our efforts on building that loft conversion for now.

This is a naive question from someone who hasn't been involved in project management in almost a decade, but why can't feature development and refactoring be split across two different ownership groups?

The first group writes the initial version and all iterations (extra features) up to the point where the expected returns from quickly pushing out future iterations is less than the amount of required effort.

The second group then comes in and does a complete refactor, without changing the look or feel of anything that the customer actually wants. Meanwhile, the first group moves onto "the next big thing".

There’s a parallel comment warning of the second system syndrome, but I’d like to point out the problem of “who gets the credit?”

Google famously suffers from this: the pm who launches a product gets promoted; the person who adds a feature gets some credit in the employee review and the person who fixes bugs is judged to have wasted their time.

I suspect Apple has this problem too (they definitely prefer reimplementation rather than evolution in many cases) but their processes are more opaque.

Who would want to be on the cleanup team when the glory goes to the path breakers?

> Who would want to be on the cleanup team when the glory goes to the path breakers?

This and the second system syndrome are both organizational issues. Why couldn't an organizational simply freeze the customer facing portion of the application (so no UX or added features) and tell a group of developers responsible for the refactoring that they will be judged on a set of achievable metrics, such as decreased infrastructure costs or better performance?

This would fall squarely within common managerial frameworks (it's basically Tuckman's group development model, or what you see at many startups that launch an MVP), except that the initial application development is handled by a different group of 'high performing' developers.

Because there are benefits that aren’t quantifiable. Developer velocity is incredibly challenging to measure objectively.

These initiatives have to be top-down priorities in the organization, with agreed upon importance.

1) The reason why such a refactor might be necessary is from things the first group tried but didn't quite work as intended, or that the users used the system differently to intended. The first group has that knowledge, the second group doesn't. So the first group will do the refactor better than the second group.

2) Beware of Second System Syndrome https://en.wikipedia.org/wiki/Second-system_effect where everyone tries to put in every feature that was missing from the first system, simply because there is no urgency around the second system, because the first system is already running.

Battle plans never survive first contact intact and software never survives customer deployment intact. Especially in systems that are ever evolving.
so true, in my experience.
The "rockstars" who write the first system will be despised by the "clean crew" who have to maintain it after its no longer easy to add features...
That's a great elaboration, very accurate, and funny! But in reality that table would always get fixed, the only reason such absurdities will remain in IT systems, is because it's not immediately visible.
I disagree completely - most people know about the problems and they've accepted them instead of tried to gin up the organization effort to fix it, because the last time they tried that they either became responsible for the cleanup or got smacked down by someone who should have been but got it wrong.
Sometimes they even have to create a separate project within the organization to cleanup the mess the owning team won't take on:

https://github.com/Microsoft/VisualStudioUninstaller/release...

The choice between refactoring and money-generating work is a false dilemma. There are other options, and the developer doesn't have to make that decision or carry out the work all on their own.
Indeed.

If the code has turned to spagetti then how do you manage to change code quickly (due to e.g. Corona rules) so you can follow where the market went and not get competed out of business?

When the company is in startup mode and has no customers, it's easy to just throw more mud on the wall.

But when you have an existing business based on 1M lines of code and you want to keep being in the market when the market changes quickly, then spagetti code can be death. Being ready means having cleaned up code beforehand so it is easy to change it.

At an organisational level, you have to make a decision on how much time you spend doing one or the other. It might be that some developers never do any refactoring but someone is always going to end up doing it. Or nobody does it, and the code slowly decays.

Unless you're saying that you don't have to do refactoring at all in the organisation, but the only way to do that surely is always get it right the first time, which isn't hugely practical. You may sometimes encounter a situation where the quickest way to build a feature is to fix some old ugly code, but that's certainly not the case every time.

My whole point is that you don't, because there aren't always just two options. That's the false dilemma logical fallacy.

I'm saying you can fix problems without dropping everything and redoing work. You're allowed to problem solve and work with people to create a third option. And you can prevent new ones by learning and strategizing.

Well whether you drop everything or clean as you go or whatever other strategy, fixing stuff takes time. Even if it's just the mental effort of designing a better way and consensus building.

I'm just using simple analogies for the sake of explanation, but it is nearly always the case that expanding the scope of work to fix previous architectural decisions that were either flawed or no longer relevant will take considerably longer than just fixing the problem at hand.

There may be the odd time, particularly in a large, well defined piece of work, where you can say actually tidying up this other stuff will save time overall. Or perhaps you can batch a bunch of improvements in the same system together into a larger, more thoughtful architectural improvement. All of that is great if you can do it, but it's often not possible.

As far as preventing future architectural issues by learning and strategizing, I feel like that's what we spend our entire career trying to get better at doing ;). But alas I, and everyone else, seem to continue making decisions that don't pan out long term. Even if you did make a perfect decision at the time, often the world/business/third party dependency changes, and what was an excellent decision in the past becomes a pain point a few years later.

It used to be the case that we tried to design infinitely extensible software so future requirements could always be incorporated, but that makes the software unmaintainable. So the pendulum swung to YAGNI and only designing for exactly what was right in front of you, but that leads to major architectural overhauls every few months. True answer is somewhere in the middle, but learning where is something that only seems to come with decades of experience.

Unfortunately older programmers all seem to be forced out of developing and into management or other careers for some reason.

I'm still trying to challenge your assumptions. Why does a different solution necessarily require expanding the scope of work? Like you said, that's where experience helps to have those skills in your toolbox. Doing things better doesn't have to be harder.
That’s a straw man argument. Of course nobody will allocate 100% of their time to clean things up instead of delivering features. That’s the path to bankruptcy. What you should do is to allocate (say) 10% to improving the system. For example, on a team of 10 software engineers allocate one to refactor/improve/simplify/remove pain from the process itself. It will pay itself back many times over in long term because of improved productivity. And even better: take turns. Each developer will get 10% of their time to improve/speed up the things that are slowing them down or is painful/frustrating. The morale boost and increased productivity is worth much more than the time spent.
> In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.

I have, too. And then I usually haven't managed to put the dining table indoors. And then new people came in and asked the same question you ask, and by then I was one of the people who tried to put the dining table indoors, and explained how it wouldn't fit through the front door, and how I tried to get it in through the window. And then the new people try to put the table indoors and fail and next thing you see they're either leaving the house or explaining to the newcomers why the table is outdoors.

Ultimately, I've realized that talk like this is cheap, unless you can actually improve things. That requires leadership skills and some political capital in your organization. I don't think the author of the article deserves an engineering leadership role simply for complaining about things. (They might still deserve an engineering leadership role for other reasons, what do I know...)

> simply for complaining about things.

With apologies to Antoine de St. Exupery; if you want to build a better system, don't drum up Jira tickets to gather user stories, make sprints and divide the work and give orders. Instead, teach them to yearn for a system that's not total bullshit.

Simply complaining is tiresome. Writing a well-reasoned internal blog post that explains the faults, gets traction for improving things, and gets people excited for your brave new world, even though it's not arrived yet; that blog post is what engineering leadership looks like.

> Instead, teach them to yearn for a system that's not total bullshit.

I worked for an organization full of such people. Intelligent, competent, and worked hard. And yet... the system, ehh...

(I've read the Citadelle on a long bus ride many years ago. It was exactly what I needed to read back then, I enjoyed it very much. Thank you for reminding me of it.)

One of the biggest systems-level failures in recent memory is the Boeing 737 MAX story. I read this article and your comment and then went looking for an autopsy, found this:

"The Boeing 737 MAX: Lessons for Engineering Ethics (2020)"

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7351545/

It's an example of a workaround that should not have been tolerated:

> "The Maneuvering Characteristics Augmentation System (MCAS) software was intended to compensate for changes in the size and placement of the engines on the MAX as compared to prior versions of the 737."

Rather shockingly this wasn't even an engineering problem workaround; it does seem that it was solely designed to avoid an aeronautical reclassification of the aircraft that would have required pilots to undergo an expensive retraining program on flight simulators, which might have caused lost orders.

This does look like a systems-level failure, but one at an organizatonal level: the system went from a state where engineering took priority, to a state where financialization took priority. In systems thinking, this could be called a state transition: a fluctuation takes place, and afterwards the system settles down to a new (apparently) stable state quite different from the old state:

> "One factor in Boeing’s apparent reluctance to heed such warnings may be attributed to the seeming transformation of the company’s engineering and safety culture over time to a finance orientation beginning with Boeing’s merger with McDonnell–Douglas in 1997 (Tkacik 2019; Useem 2019). Critical changes after the merger included replacing many in Boeing’s top management, historically engineers, with business executives from McDonnell–Douglas and moving the corporate headquarters to Chicago, while leaving the engineering staff in Seattle (Useem 2019). According to Tkacik (2019), the new management even went so far as “maligning and marginalizing engineers as a class”."

Or as one person called the post-merger company: MacDac in a Boeing suit.
> It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.

What’s even more infuriating is seeing new engineers join that team, question why the hell something insane is insane, and then slowly grow used to the insane thing. Only for the cycle to repeat when the next new person joins.

I make a huge point of confirming to them, that yes we do acknowledge it’s insane. Even if management doesn’t care.

Corroding the willingness and ability to think and communicate clearly. Why bother? Not only will nothing change, it won't even be respected.
> In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.

Oh wow, that hits home. To be fair, the historical context for the decision can be valuable information, the problem is the next step. Even if you can't fix it right now, you might make steps towards that. Or you might say: Now that we have this heavy table outside, why not attach more things to it?

I agree, the impasse that I often see is between people who think a change must happen “now“ and those who think it should happen “never“. There is a lot of space between those positions, an optimal usually exists in there.

It’s just like re-factoring heavily interdependent code, except without the advantage of the dependencies being written down

> the impasse that I often see is between people who think a change must happen “now“ and those who think it should happen “never“

The problem is that the organisational equivalent of ‘later’ is ‘never’. Therefore, if something needs to actually be done, the only time is now.

> There is a lot of space between those positions, an optimal usually exists in there.

Yeah but finding, or rather estimating, this optimal is a lot of work, and requires you to have one foot in both camps, and some kind of process/authority to make a decision, and some incentive to make a short term sacrifice for long term gain. That's just not going to happen in a weekly sprint planning, in a company that's aiming for the next quarterly report.

By all means say the change will happen in 3 months time instead of never.

Then in 3 months time the choice is the change will happen now, or it will happen never.

You can schedule it for the future, but the choice will always be "do it now" or "don't do it now".

This is a really helpful way of describing a solution.
everyone's worked in companies like this. did you try formulating very specific, actionable migration plans at any of these jobs? It's one thing to say, "this is stupid! we should use XYZ" and expect everyone to say "wow! you're right, we'll do that right away", and another (quite another) to actually formulate the superior architecture concretely, break it into digestible migration steps, sell the organization on if not the whole architecture at once, then at least on the first several migration steps, and to guide the organization into that new architecture for real.

obviously this is more or less possible given specific organizations and personalities at said organizations. my point is more that the magnitude of the task of migrating the ossified organization to a better architecture, even with a fully pliant staff totally on board with changing, should not be underestimated.

... because we forgot that you can take the legs off of the table ... because the screwdriver we needed to take off the legs was in use elsewhere when the table arrived ...
Just buy or build a different table.

Don’t ask for permission, just fix the stuff so that it works for you and maybe your small team and then announce it.

There is no documentation and no planning? Just start writing documentation, just start planning. If you need permission, I grant this to you. I‘ve seen too many internal projects not even having a README, so this is now something I start whenever I have to debug something and wished to have documentation.

Someone needs you to do something? Ok, I‘ll share my screen, start asking questions, and write all the important things down.

And now you try to suggest using a screwdriver, but your suggestion gets immediately attacked and buried because the team already have an established culture around the impossible table, and they don't want to change their ways, or made to look incompetent by revealing that there was an easy solution all along.
Or it can be as simple as someone's (or a whole team) job is to maintain those workarounds and bandaids and they're very invested in keeping their job (or they hold all the IP in their head).
Not just an established culture but a product with a long history of success despite its jank and problems. I appreciate the enthusiasm of new hires but often they don't understand priorities or that the goal is profit and not perfection (for most of us).
There are two of me. The me of now, and the me of hindsight. Hindsight me is way smarter, and able to criticise every decision we made leading us into the mess that is now. Now me needs to make quick decisions based on imperfect information, budgets and tight timelines.

If it were up to hindsight me, we'd have carefully designed and orchestrated every past decision. We'd do full design qualification and change control on projects and purchases, researching carefully to never make a mistake. We'd sit as a team and brainstorm every possible implication. We'd write, execute and document tests for everything to prove ourselves. Any sniff of inefficiency and we would stop everything and fix it, no matter the cost. We'd take time to document, investigate and follow through. It would be a glorious cavalcade of plans and CAPAs, qualifications and tests. And reports! Binders and binders of wonderful validation reports everywhere!

If it were entirely up to hindsight me, we'd run our little widget company like we're building a space shuttle. Of course, we'd never make any money. But we'd be doing it right, by gum!

Most companies need to be somewhere in the middle. No hindsight and you end up a tangled mess of short-sighted kludges, all hindsight and you can't move forward. Either way you risk ending up a lead balloon.

If you find yourself in the kludge company territory, then here's some advice from a talk I recently attended[1]:

Start by training yourself and your team in Root Cause Analysis. Empower them to start thinking deeply and critically about what's really causing the problems and inefficiencies you encounter. Understanding root causes naturally translate into solutions that aren't just bandaids. Use these skills in your day to day, and you'll start building a culture of quality around your systems.

[1] Steve Gompertz

> I wouldn’t dare count the number of times I’ve been told the technical details of why something is the way it is, without anyone ever saying the reason why we actually wanted it to be this way. My thesis was usually: we don’t.

In my experience, at the time the decision was made, folks did want it that way. The organization has lost that context as to why and has only documented the technical design.

A curse shared by less effective engineers I've worked with is to rage at legacy decisions unable to convince the organization to revise them. They lack the ability to understand the various stakeholders involved and to come up with a plausible plan. A systems engineer (as referenced in the blog) would understand the various sub-systems that make up an organization and be able to drive the change they desired (the conclusion that it's irreparably broken or you lack the expertise to fix it would be fine too).

Maybe it's just over my head, but the article was quite a letdown after your great analogy. It may have valuable information in it, but I don't see how I could share the article with the people who might need to hear it and have them understand or care. It's preaching to the choir.

Love your post, though!

> It's preaching to the choir.

The thing is. If your manager is not already inclined to actually read the post, they’re never going to be receptive to that kind of change anyway.

Hence preaching to the choir?...
Great comment on a good article.

I'm reminded of back when I was studying pattern recognition for a system that would become an Expert System (this was before that term was used). I would read many articles saying what techniques would work. I had the urge to ask "But this doesn't show how you got there. Show me your discarded solutions that didn't work." I would like to see your wastebasket.

Similarly, I am inclined to someone who acts as an expert to tell me five ways that won't work.

> Somehow, when the real examples of this stuff are clever engineering around build / docker / polyrepo / release / feature flags / third party bugs, the cleverness makes people think the existence of the workaround should be tolerated. It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.

The only logical reason to do this is because it has no impact on the business. Or at least, smaller impact than a total rewrite/refactor would have.

If an engineer presented a case where fixing an underlying issue resulted in better business outcomes vs. a short term band-aid, then I don’t know anyone that would tell them no. Businesses want to succeed. They want to make money. If you can help me make more money I’ll let you do whatever you want (within the confines of the law and civil society).

I think half the time there was no need for a dining table in the first place, just a shiny solution waiting for the question to be figured out afterwards.