Hacker News new | ask | show | jobs
by gampleman 1667 days ago
One thing that bugs me is the notion that "Software rewrites are something you should never do", which is a mantra so often repeated that it has acquired the status of self-evident truth, despite the only evidence being (usually) presented is an example of a web browser from 20 years ago! (Which incidentally spawned Mozilla, so not exactly a complete loss; especially from the POV of society rather than shareholders, but I digress).

Having rewritten a bunch of systems (sometimes several times) I can attest that it will not always lead to the death of the company. The trick is of course having modular enough systems that they can be rewritten from scratch in a reasonable amount of time.

It can also be a great way to increase the simplicity of the system as typically the old version was designed with a very imperfect understanding of the problem and no operational experience servicing it; further learning were usually crudely patched on top and you often end up in a conceptual hodge-podge where words mean subtly different things depending on the context and translation layers need to be inserted between the contexts etc.

Often a (good) rewrite starts by clarifying the conceptual model. I like the saying "clear writing reflects clear thinking", and in programming there is a lemma "clear thinking produces clear code".

6 comments

I suspect that the main issue with rewrites is that the users or product managers see it as an oportunity to add new features or redo old ones extensively. In the end the scope of the rewrite is no longer a rewrite but a new product that is incompatible with the original it was supposed to replace. I have seen this happen a couple of times. A straight rewrite for technical reasons and well defined scope does not suffer these issues.
This is a great point, and my successful rewrites have done the opposite, reducing scope/capabilities. "We changed other systems and no longer need to handle x/y/z in this service", especially when most x/y/z's are edge cases now eliminated.
one huge issue with a lot of technical and product debt is that any re-write gets saddled with a huge dam of expectation bursting. Many people who have been told they can't have their feature for years because the volition of the software team has been close to zero (hence the rewrite) for so long suddenly push their demands onto the new product. Its hard for a re-write to focus on an MVP as a consequence.

Arguably it can be better to float a completely different boat and see if it swims but that can result in a product positioning problem where you then have two versions of the same product but with an uneven feature set.

The way it was once told to me:

Old developers have left. New ones come in. The systems the old devs built suck, so the new team convinces management to do a rewrite. The new devs don’t really understand the problems encountered by the old ones so they repeat all the same mistakes. New system, same problems.

The lesson we’re supposed to take away “no one should ever do full rewrites”. It’s a stretch, but IMO the proper takeaway should be 1) really understand the old system and 2) have a very good reason before doing full rewrites.

The way I've heard it is is "Evolution, not revolution".

You evolve your code with refactoring and rewriting only pieces at a time. This is opposed to revolution, also known as "the big rewrite", which replaces the entire application all at once.

Your "modular enough systems" seems to indicate that you also favor evolution over revolution.

> despite the only evidence being (usually) presented is an example of a web browser from 20 years ago!

Actually normally the evidence is lots of other companies that failed to do rewrites. It's just that one was a full scale fuck up. I'm currently working at a company that literally it's echoing Netscape. The issue wasn't the rewrite it was rushing a half finished rewrite out the door. It was stopping product development for so long.

My current employer started a rewrite but called it a migration gave it a 3 month deadline. 3-months to write all the features it took 7 years to write. They realised this was impossible and remove a bunch of features and decided this rewrite would remove features they will add back later. But they still kept on setting months for something that has taken 18 months so far with even more features removed. It almost a daily thing that yet another product feature was removed to cut down time. They claimed they were feature complete in september because it had to be done under all circumstances, they found out they hadn't done 50% of the features they said they would. So with more rushing of the features they hadn't written they then started talked about releasing it before it had passed QA. They announced the release date before it had passed QA. We have partners using it and saying it's broken for them. They don't have all the data. And yet they're still releasing it on Monday. Why? Because it had to be done in 2021. They're rushing a half finished rewrite out the door to hit a target set by management. So they spent 18-months removing features and when they release it, it will be buggy.

So, yea I mention Netscape a lot because honestly, this sounds the same. Rush out a half done rewrite while allowing the competitors to improve their product while we made ours worse.

> Having rewritten a bunch of systems (sometimes several times) I can attest that it will not always lead to the death of the company. The trick is of course having modular enough systems that they can be rewritten from scratch in a reasonable amount of time.

I would say that the trick is not to do the rewrite. You refactor each part until the entire system is rewritten.

> I would say that the trick is not to do the rewrite. You refactor each part until the entire system is rewritten.

This is exactly the right way to approach this. The best way I’ve seen to rewrite a complex system is to literally work off a branch and deploy it in QA beside the old version. The hardest part is figuring out the right way you want to direct traffic to the “new version”.

My team inherited a massive system that was the key revenue generator for a multi billion dollar company. It was an operational nightmare from deployments to stability. It had at least 1 24 hour outage that was nearly impossible to root cause.

We slowly chipped away at it for 8 months running in parallel in QA until we were satisfied that it was functionally equivalent. Started running traffic in prod while we tuned it to start taking real traffic and had the whole thing replaced in 18 months.

The system was replaced, is handling 2X the load in prod of the old system and hasn’t had an outage years

Having dealt with a similar setup that basically had to be re-written, I totally agree. I would also like to add that if one senses themselves to be in a situation like this where the system is a messy build-up over years, try to resist adding things that are absolutely not essential. Otherwise, the guy doing the clean-up/re-write down the line may be forced to take not-so-clean approaches to cater to those non-essential bits mainly for backwards compatibility.
Rewrites are also sometimes very good for your career. A friend working at Google for about ~1.5 years said a new Sr. Director of Engineering joined and their first decision was to rewrite the project, and it would take 2 years.

So now the Director has job security for about 2 years, gets the big launch near a promotion cycle they have a small chance of being considered for promo, and gets to blame the predecessor for all the problems.