Hacker News new | ask | show | jobs
by hn_throwaway_99 2176 days ago
I'm a big fan of Joel Spolsky's earlier blog posts, but to be honest, I don't think this piece has aged well. If anything, I'm more of the opinion now that you should almost always plan to do a rewrite, eventually. Lots of big companies successfully rewrite stuff all the time. Google is fairly well known for having rewritten large, critical pieces of their codebase over the years.

If anything, what should be warned against is big bangs. Netscape's problem isn't that they did a rewrite (which eventually became Firefox, mind you) it's that they essentially abandoned their old code too early, and similarly, they also announced the rewrite too soon.

If you're going to do a rewrite, do it quietly, and don't announce it until it's close to ready, and even then you can roll out slowly. For example, the infamous Digg v4 debacle is another example, but the problem isn't that they did a rewrite, it's that they did a rewrite to produce a product nobody wanted, and they burned any possibility of going back after they released it.

7 comments

The more experience I get, the more I think Joel was right. Developers want to believe rewrite-from-scratch can succeed because green-field development is easier and more fun than slowly refactoring inscrutable legacy code. But long before the rewrite project achieve feature parity with the original, it is already marred by the same issues that motivated the rewrite. Because the dysfunction which led to the original code turning into a mess have not been addressed.

"But we are smarter than those other guys who wrote this mess" - if that is the case, you should be smart enough to fix the mess.

I think I mostly agree with your take, but if I could defend the counterargument for a moment, there exist stronger cases for a rewrite than "we're smarter than those other guys." Namely, what happens if your requirements have fundamentally changed?

Classic Mac OS was written at a time when computers had kilobytes of memory and black and white graphics. It was not designed for multi-tasking, because at the time it was written, the world had insufficient hardware to make multi-tasking practical. It was not designed for networking, because the internet mostly did not exist at the time. It was not designed for security, because computers were a less important target for criminals, and no internet meant far fewer exploit vectors.

All of these features were post-facto bolted onto Mac OS, and the result was an unstable mess. It was ultimately a full OS rewrite that fixed the platform.

---

I'll make a separate case too, for a rewrite I wish would happen: it's all well and good for Slack to build their client in Electron as a small startup that needs to experiment and iterate quickly. However, Slack is now a reasonably-sized public company, and they (should) have a stable product that will not undergo rapid changes.

Now would be an excellent time for Slack to rewrite their client to be a native app on each of the major platforms. They have the resources to create a snappy, performant app that more customers will enjoy using.

But didn't Apple actually attempt a full rewrite of the Mac OS which eventually failed? So instead they bought an already working OS and adapted it for their purpose.
Well, as I'm thinking of it NextStep was still effectively a "rewritten from scratch" OS—it just happened to get written outside of Apple.
That is not what Joel means by rewriting from scratch. NextStep was not written to replace Mac OS, it was just later adapted into that purpose. It is a completely different scenario.
Oh yeah. I remember that clearly. It was Copland- er, MacOS 8.

What a cluster----. Back then, crashes would result in a special kind of debug screen called MacsBug[0].

When you walked into their "release-ready, hands-on lab," almost every screen was displaying MacsBug.

The change didn't actually happen until NextStep became Cocoa.

I was also at a Microsoft "Longhorn" prerelease event. They were showing "live code demos," but you could clearly see the presenter quitting Director, when they were done with their demo.

That became Vista, another famous success story (but at least, it did ship).

[0] https://en.wikipedia.org/wiki/MacsBug

From wikipedias Copland article (https://en.wikipedia.org/wiki/Copland_(operating_system)):

"The Copland development effort is associated with empire-building, feature creep, and project death march. In 2008, PC World named Copland on a list of the biggest project failures in IT history."

"what happens if your requirements have fundamentally changed?" That's not a rewrite, that's a new program with the same name
What happens is that you write the first version, and then the requirements fundamentally change (or you learn that your requirements were fundamentally wrong) but it happens gradually and you change and change the old version. It's now mostly meeting the new set of requirements, but has a bunch of cruft from what it used to do.

Throwing that away and writing something to support the new set of requirements is a rewrite.

I take no position, here, on whether it should be done.

I would recommend just removing the cruft rather than throwing everything away. But cruft should really be removed continuously.
I was merely being descriptive, clarifying that there are not uncommon situations where requirements have changed and yet nonetheless one of your options is aptly termed a rewrite.

If I am to move into making recommendations, then I think the whole thing is ultimately situation dependant, but that your advice is correct for the most common cases.

Rewriting because it is messy will almost never work. Rewriting because there is a fundamental problem with the code base may be worth it.

Problems can include:

- lack of parallelism, designed with single core CPUs in mind

- lack of security, software was designed for single user, offline operation turned multi-user and online

- "wrong" optimizations, for example relying on a lot of precomputation when people now want to change everything on the fly and modern computers allow it if the software is properly designed

- relying on outdated tech, like Flash

None of the previous points involve bad design, but things change, including user expectations. For now, security is a big one. It wasn't a big deal back then, for example, in a game console, a buffer overflow in a game would just cause a crash and piss off the player in some extremely rare case. Now because your console is online and so is your bank, the same relatively harmless bug on the same game can be used to siphon your bank account.

If the code is reasonable well designed (low coupling, separation of concerns and so on) then architectural changes can be introduced by refactoring. I can't imagine an architectural change which will require every single line of code to be rewritten in a realistic application. Most likely there will be large chunks of complex business logic which is largely unaffected by the architectural change.

But migrating away from an obsolete platform probably do requires a massive rewrite. Even then it might be possible to port (rather than rewrite) large chunks.

Actually, the more experience I get, the more I think Joel was wrong. Like anything in software, I've had the experience that there are good ways and bad ways to do rewrites. Start with the reasoning for doing it:

Bad reasons: "The code is messy", "It's written in language foo while all the cool kids are using language bar these days", "It's slow".

Good reasons: "The architecture is too tightly coupled now and it won't scale without untangling major pieces anyway", "The lack of our forward velocity is directly related to problem XYZ in the code, and here is how a new architecture would fix that."

In other words, I feel like I can sniff out bad rewrites now, which are generally lack a sense of focus and a true, enumerated list of problems with the current code base. Good rewrites have clearly delineated benefits that the rewrite will bring, stuff that brings measurable value, and show that the rewrite will bring things better and more cheaply than what's possible with the current codebase.

> Actually, the more experience I get, the more I think Joel was wrong.

There is no black and white in software. Joel writes from a viewpoint that's not always applicable to all of us.

> There is no black and white in software.

That is my whole point. The title of the blog post is "Things You Should Never Do", and it highlights "They did it by making the single worst strategic mistake that any software company can make." (emphasis Joel's).

Doing a rewrite may be a horrible mistake, but Joel was just wrong advising no company should ever do it. Lots of companies have done that, quite successfully.

> But long before the rewrite project achieve feature parity with the original, it is already marred by the same issues that motivated the rewrite.

I think there are definitely counter-examples. Can you imagine using an OS in 2020 based on incremental improvements in Mac OS System 9, or Windows ME. Or browsing using a browser based on incremental improvements in Netscape 4.7?

These are hypotheticals, but isn't the Blink engine incremental improvements all the way back to KHTML? It has been more successful than Mozilla.
But again in the Windows ME vs WinNT/Windows2000 case it wasn't a re-write. It was improving an existing product and using it in a consumer space.
But Windows NT itself was a rewrite not an evolution of Windows 3.1.
Windows NT wasn't a rewrite of Windows 3.1, but an attempt of Microsoft to create a modern workstation OS that is in a completely different product category than DOS/Windows 3.1.
> if that is the case, you should be smart enough to fix the mess.

The GEOM subsystem is FreeBSD is an example though that sometimes the world changes in a way that you can't accommodate and you have to go do a big bang reset.

Because of that, FreeBSD has a much nicer set of methods for dealing with disks and hotplugging and resizing ... while Linux is still all very ad hoc.

There are other arguments for rewrites usually though.

Some of the problems came only to light, because the original software was written and showed weaknesses. Of course afterwards one always knows more.

Then there is the reason of software often not perfectly matching your use-case and you could have better, specialized for your use-case software. You might also have other ideas about how extensible and modifiable your software should be. There really is a lot of software out there, that barely works for its use-case. If you are asked to extend that, good luck with that, without introducing new bugs, due to inflexible design.

So there are very valid reason for rewriting and often you do know better, how to write the software for your own use-case.

It's about even more than that: it's about the complexity.

There's an inherent complexity in the problem being solved.

And there is an "accidental complexity" in the implementation of the solution.

Throwing away everything, people typically believe that they can avoid handling a lot of the "inherent complexity." But typically there is a good reason why the inherent complexity was addressed in the previous version of the program, and there's a big chance that the new "from the scratch" designers will have to relearn and rediscover all that, instead of transforming the already existing knowledge that is encoded in the previous version.

For anybody interested in the topic, I recommend the number of case studies presented in:

https://www.amazon.com/Search-Stupidity-Twenty-Marketing-Dis...

"In Search of Stupidity: Over Twenty Years of High Tech Marketing Disasters"

See about new rewrite of Wordstar simply not having the printer drivers that the previous had, and also other features people already expected, leading to Wordstar's demise.

Or what Zawinski's names "Cascade of Attention-Deficit Teenagers" (search the internet for that, the link here wouldn't work!)

"I'm so totally impressed at this Way New Development Paradigm. Let's call it the "Cascade of Attention-Deficit Teenagers" model, or "CADT" for short."

"It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?"

Or an interview with Jamie Zawinski from Siebel's "Coders at Work."

https://www.amazon.com/Coders-Work-Reflections-Craft-Program...

... "even phrasing it that way makes it sounds like there’s someone who’s actually in charge making that decision, which isn’t true at all. All of this stuff just sort of happens. And one of the things that happens is everything get rewritten all the time and nothing’s ever finished. If you’re one of those developers, that’s fine because there’s always something to play around with if your hobby is messing around with your computer rather than it being a means to an end — being a tool you use to get whatever you’re actually interested in done."

If one is able to cover all the complexity, and it is not destructive to the goal, the rewrite is OK. Otherwise, one should be critical to the ideas of rewrites as they could be potentially secretly motivated by simple (jwz again): "rewriting everything from scratch is fun (because "this time it will be done right", ha ha)"

Your last sentence made me think humorously of Fermat's Last Theorem, and 350 years of grad students trying to prove it...so ironic and in that way funny.
Netscape’s big bang rewrite is in great contrast to the current work on Firefox Quantum. Mozilla is slowly rewriting pieces of the browser in Rust to gain better performance and security.
The reason is less important than the methods.
I can't imagine this to be true.
If you can rewrite in the style of Ship of Theseus, and get a higher quality result, sure.

But if the architecture is wrong, a piecewise rewrite to a new architecture is very tough.

My impression: it's still easier than a rewrite. People always underestimate a rewrite, because they only consider the complexity they can think of, which is generally only the tip of the iceberg. I've been on projects where even though people nervously joked about the dangers of rewrites they still underestimated the costs. Bonus points if afterwards people scratch their head thinking "where the heck did all that time go"?

I think rewriting to a new architecture piecemeal is probably easier than in a big bang - assuming the thing is at all complex and you can't actually understand all of it at once. The difference is more one of perception. It's easier to see the costs of the Frankenstein architecture than the rewrite, so we overestimate the costs of the former and underestimate the latter.

The real case for a rewrite is if you honestly think all that old stuff really has mostly just sentimental value; i.e. that it's OK to break all kinds of workflows because there are enough alternatives. If you can sell actual users on relearning all their habits, you can get away with a lot.

Google tends to rewrite without all of the features. I guess you can get a way with that if you’re Google.
There's a running joke at Google that each internal tool has two versions: an old one that's deprecated and not maintained, and a new one that's being written and not ready. So I don't think these rewrites necessarily persist because they're being done well.

I keep seeing the same optimistic failure mode over and over again at each employer. It's just too tempting to rewrite, say, the SDK package deployment system rather than trying to understand and fix up the messy old Python that some past engineers left behind. And it'll only take three or four months! A year later the messy old Python with unfixed buts is still the only option because the rewrite isn't ready, and the rewrite consumed all the resources that would've been used to fix bugs.

I think a big reason people opt to re-write is a lack of desire to understand the problem as a whole and belief that anything overlooked will get fixed in production. I agree we need to be suspicious of these intentions.

Not to brag, but I recently spent a week re-writing some software I developed over 5 years (thanks covid), and it runs 100x faster. No joke. Better code, better SQL, better indexing, less list manipulation. 100x faster.

That's great work but anything you can do in a week and then throw away if it turns out bad is hardly a risky rewrite.
It's a lot easier to do now, because we are in a dependency-based system. We can often replace tens of thousands of lines of legacy with a single call to a SaaS API.

Of course, T.A.N.S.T.A.A.F.L, so caveat emptor. Needless to say, picking the right dependency is a big deal.