Hacker News new | ask | show | jobs
by cameronh90 1524 days ago
Devil's advocate:

If you spend all your time refactoring, cleaning up after legacy design constraints, fixing ossified errors, then you run out of time and fail to write actual meaningful income generating features. Conversely if you make none of those improvements, eventually the weight of bad architecture slows all progress to a halt and no new income generating features get delivered.

One of the hardest parts about advancing as a developer, in my opinion, is being able to tell when you should refactor versus just leaving the old working mess alone.

With your example, it's like if it turns out that the dining table is embedded into the ground with concrete because it kept blowing over, and moving it indoors would require getting a carpenter to create new legs. And also that because the dining table has been there for so long, someone decided to run electricity cables through it, so rerouting it requires an electrician and will shut down the factory at the bottom of the garden for half a day. We could buy a separate table for indoors, to try and slowly migrate to the new table, but then we'd have two tables to maintain and we all know how that usually goes.

At a certain point, you look at it and go well the commercial dishwasher is just $9k and we can focus our efforts on building that loft conversion for now.

4 comments

This is a naive question from someone who hasn't been involved in project management in almost a decade, but why can't feature development and refactoring be split across two different ownership groups?

The first group writes the initial version and all iterations (extra features) up to the point where the expected returns from quickly pushing out future iterations is less than the amount of required effort.

The second group then comes in and does a complete refactor, without changing the look or feel of anything that the customer actually wants. Meanwhile, the first group moves onto "the next big thing".

There’s a parallel comment warning of the second system syndrome, but I’d like to point out the problem of “who gets the credit?”

Google famously suffers from this: the pm who launches a product gets promoted; the person who adds a feature gets some credit in the employee review and the person who fixes bugs is judged to have wasted their time.

I suspect Apple has this problem too (they definitely prefer reimplementation rather than evolution in many cases) but their processes are more opaque.

Who would want to be on the cleanup team when the glory goes to the path breakers?

> Who would want to be on the cleanup team when the glory goes to the path breakers?

This and the second system syndrome are both organizational issues. Why couldn't an organizational simply freeze the customer facing portion of the application (so no UX or added features) and tell a group of developers responsible for the refactoring that they will be judged on a set of achievable metrics, such as decreased infrastructure costs or better performance?

This would fall squarely within common managerial frameworks (it's basically Tuckman's group development model, or what you see at many startups that launch an MVP), except that the initial application development is handled by a different group of 'high performing' developers.

Because there are benefits that aren’t quantifiable. Developer velocity is incredibly challenging to measure objectively.

These initiatives have to be top-down priorities in the organization, with agreed upon importance.

1) The reason why such a refactor might be necessary is from things the first group tried but didn't quite work as intended, or that the users used the system differently to intended. The first group has that knowledge, the second group doesn't. So the first group will do the refactor better than the second group.

2) Beware of Second System Syndrome https://en.wikipedia.org/wiki/Second-system_effect where everyone tries to put in every feature that was missing from the first system, simply because there is no urgency around the second system, because the first system is already running.

Battle plans never survive first contact intact and software never survives customer deployment intact. Especially in systems that are ever evolving.
so true, in my experience.
The "rockstars" who write the first system will be despised by the "clean crew" who have to maintain it after its no longer easy to add features...
That's a great elaboration, very accurate, and funny! But in reality that table would always get fixed, the only reason such absurdities will remain in IT systems, is because it's not immediately visible.
I disagree completely - most people know about the problems and they've accepted them instead of tried to gin up the organization effort to fix it, because the last time they tried that they either became responsible for the cleanup or got smacked down by someone who should have been but got it wrong.
Sometimes they even have to create a separate project within the organization to cleanup the mess the owning team won't take on:

https://github.com/Microsoft/VisualStudioUninstaller/release...

The choice between refactoring and money-generating work is a false dilemma. There are other options, and the developer doesn't have to make that decision or carry out the work all on their own.
Indeed.

If the code has turned to spagetti then how do you manage to change code quickly (due to e.g. Corona rules) so you can follow where the market went and not get competed out of business?

When the company is in startup mode and has no customers, it's easy to just throw more mud on the wall.

But when you have an existing business based on 1M lines of code and you want to keep being in the market when the market changes quickly, then spagetti code can be death. Being ready means having cleaned up code beforehand so it is easy to change it.

At an organisational level, you have to make a decision on how much time you spend doing one or the other. It might be that some developers never do any refactoring but someone is always going to end up doing it. Or nobody does it, and the code slowly decays.

Unless you're saying that you don't have to do refactoring at all in the organisation, but the only way to do that surely is always get it right the first time, which isn't hugely practical. You may sometimes encounter a situation where the quickest way to build a feature is to fix some old ugly code, but that's certainly not the case every time.

My whole point is that you don't, because there aren't always just two options. That's the false dilemma logical fallacy.

I'm saying you can fix problems without dropping everything and redoing work. You're allowed to problem solve and work with people to create a third option. And you can prevent new ones by learning and strategizing.

Well whether you drop everything or clean as you go or whatever other strategy, fixing stuff takes time. Even if it's just the mental effort of designing a better way and consensus building.

I'm just using simple analogies for the sake of explanation, but it is nearly always the case that expanding the scope of work to fix previous architectural decisions that were either flawed or no longer relevant will take considerably longer than just fixing the problem at hand.

There may be the odd time, particularly in a large, well defined piece of work, where you can say actually tidying up this other stuff will save time overall. Or perhaps you can batch a bunch of improvements in the same system together into a larger, more thoughtful architectural improvement. All of that is great if you can do it, but it's often not possible.

As far as preventing future architectural issues by learning and strategizing, I feel like that's what we spend our entire career trying to get better at doing ;). But alas I, and everyone else, seem to continue making decisions that don't pan out long term. Even if you did make a perfect decision at the time, often the world/business/third party dependency changes, and what was an excellent decision in the past becomes a pain point a few years later.

It used to be the case that we tried to design infinitely extensible software so future requirements could always be incorporated, but that makes the software unmaintainable. So the pendulum swung to YAGNI and only designing for exactly what was right in front of you, but that leads to major architectural overhauls every few months. True answer is somewhere in the middle, but learning where is something that only seems to come with decades of experience.

Unfortunately older programmers all seem to be forced out of developing and into management or other careers for some reason.

I'm still trying to challenge your assumptions. Why does a different solution necessarily require expanding the scope of work? Like you said, that's where experience helps to have those skills in your toolbox. Doing things better doesn't have to be harder.
It doesn't always require expanding the scope of work, but very often does. I even suggested a few situations where it doesn't, but in many cases fixing the true underlying problem involves expanding the scope of work.

It's hard to argue the nitty gritty without examples so here's a real world one from quite a long time ago, in a company that went bust after the death of the owner.

--

We had a system that had a significant quantity of code written in a custom language that would be compiled by an internally written compiler. This compiler was in some ways a work of genius, written in the 80s, but it had a lot of very deep architectural flaws in the optimiser that meant certain patterns of code would generate invalid output. We didn't write much new code in this language but had a pretty large body of code that needed to continue running.

So during a server hardware refresh, we found that almost everything was crashing. Turns out, a compiler optimiser flaw meant that any time a loop had a number of iterations that wasn't a multiple of the number of CPUs, generated programs would segfault.

We investigated what it would take to fix the underlying issue but it would have been a week or more of work just to understand why it was happening. Porting all the old code would have taken even longer.

Instead what we did was, using a pre-existing AST manipulation library we had written, add a prebuild script that hacked all of the files to include a CPU count check then pad out the number of iterations with NOPs. Took a few hours and unblocked the server upgrade.

--

Another, perhaps less esoteric and more recent example:

A third party open source library we use had an issue where a particular function call would sometimes get stuck in an infinite loop due to incorrect network code in the library interacting badly with our network hardware.

We submitted a bug report and fix, but maintainer wouldn't accept a fix unless we also changed a bunch of other related code, added a bunch of tests etc. which we didn't have time to do. We considered a fork but that would involve keeping it up to date, rebuilding packages and so on.

We worked around the issue by running it in a different process and monitoring CPU usage. If CPU usage goes beyond q certain threshold, we kill the process and try again.

Workaround was quick and has been working fine for over a year now. Contributed patch is still languishing in an open PR with various +1s from other users.

That’s a straw man argument. Of course nobody will allocate 100% of their time to clean things up instead of delivering features. That’s the path to bankruptcy. What you should do is to allocate (say) 10% to improving the system. For example, on a team of 10 software engineers allocate one to refactor/improve/simplify/remove pain from the process itself. It will pay itself back many times over in long term because of improved productivity. And even better: take turns. Each developer will get 10% of their time to improve/speed up the things that are slowing them down or is painful/frustrating. The morale boost and increased productivity is worth much more than the time spent.