Hacker News new | ask | show | jobs
by jayvanguard 3820 days ago
The article is dubious at best. You hear about the big failures or notice when rewrite releases take forever, but you never hear about the companies that slowly lose opportunities or fail to keep up with innovators because they keep trying patch a crappy codebase. Nor do you necessarily hear about the successful rewrites since they just happen under the covers.
2 comments

One of the major points of the Joel article is that the best option is usually one you didn't name; incrementally fix the "crappy" codebase. At no point do you do a "big rewrite", at no point do you have a big step back, at no point do you lose the ability to make forward progress because the new code isn't ready and the old code is deprecated, etc. Even if it may take somewhat longer to get there, the integral of value over time often still comes out larger for incremental improvement.

Developers want the default answer to "abandon the old mess and write a new one" (snarkiness fully on purpose); Joel's point is that the default answer ought to be incremental improvement. Not that it's always the right answer, but other answers ought to be scrutinized more closely than developers might like.

From a professional point of view, it's actually perfectly fair to consider that greenfielding a new project with hot new tech (or even "newer"-but-established tech, my personal favorite choice) is more fun than trudging through old code. Human factors matter a lot. But we are also professionally obligated not to overprivilege it.

Besides, if you treat it as a serious project instead of a series of hack jobs, in my experience, very serious incremental improvement still offers a lot of engineering challenge and fun. I think one of the biggest mistakes people make is to prejudge incremental improvements as a hack job, when it becomes a self-fulfilling prophecy.

Incrementally fixing the "crappy" codebase is great in some cases: Michael Feathers's book on rescuing legacy code is fantastic. However, the limit comes at the language barrier.

Consider a COBOL codebase on a mainframe where you cannot find developers interested in learning the language and developing against the mainframe is convoluted -- you may not be able to pay talented people enough to toil in those coal mines, and you have to have to overpay subpar talent (or chase after the handful of expert mainframe COBOL developers.)

You can incrementally rewrite, just like you can incrementally refactor. Stick an API gateway in front of your COBOL mainframe that responds the same way it does, and then stand up a new well-architected service, microservice by microservice, that has a good API—and have the API gateway query the new service (using its nice API) whenever clients make calls to what they think is the legacy service, passing whatever calls you haven't re-implemented yet through to the legacy service.

Eventually, everything will be on the new services, and you can shut down the legacy COBOL system and just keep the API gateway there to pretend. (If you can get clients switched over to consuming the new APIs directly, you can shut down the API gateway too—but good luck with that; their side probably has mainframes too.)

It's called the strangler pattern by many. (You probably know that, but our dear reader may want to follow up.)

It also assumes that you have a proper API to start. It assumes that you have the organizational maturity to handle synchronizing two separate systems and the distributed transactions that entails.

The other problem not mentioned in the rewrite/refactor conversation is that the most common reason for rewrites is that the business has backed itself into a corner and the assumptions under the first system do not apply to where the business wants to go.

"It assumes that you have the organizational maturity to handle synchronizing two separate systems and the distributed transactions that entails."

Well, to be honest, when we're talking about proper maintenance and advancement techniques, we must by definition be discussing the topic only for those with the discipline to correctly implement relevant techniques and policies. If your developers or management choose not to, be it for whatever reason up to and including total lack of requisite talent or experience somewhere, you've already lost and the only thing that can possibly save you is to address that problem first, and if that's not possible for whatever reason, you've simply already lost.

As a result, it turns out not to be an interesting case to discuss. Even if it is, probably, the dominant case in the field....

I think it makes the most interesting case. How do you level up an organization? It's a very hard problem that hasn't been solved yet.

In some markets, the talent to handle this type of project doesn't exist. In other markets, the cost to acquire that talent is more than the marginal cost of rewriting the system.

What you are saying sounds cavalier to me. Fact is, big rewrites are very risky and there's nothing appealing about working on them, I can attest! Any developer worth his salt should fight tooth and nail to kill such a rewrite project and then run screaming when business and management go forward anyhow with scrapping the working legacy system in order to "start fresh".

Lost opportunity does suck. It would be nice if the first system was written in an extensible way with lots of tests and correct documentation, but it rarely is. All software needs to be replaced at some point. All software has a finite useful lifespan. So, this is a common problem. But you must have a responsible plan for replacing any first system. Any sort of project plan that has a magic day where the legacy system is shutdown and the second system is turned on is reckless and absurd. I've heard of failure rates for such projects being 80%, with the rest going way over budget before completion.

The best you can do when replacing a crufty old system is to employ strangulation. You slowly, methodically kill off the legacy system one feature at a time with the equivalent features in the well-written/tested new system. You run both systems side by side, sharing the processing load until one day, years in the future, when you have replaced all the features and the legacy system has nothing to do. It's a hassle and it requires everyone to be organized from start to finish, but you eliminate the big-bang release risks.

Luckily, with all the web services and factored out componentry we've been building in software over the past 10 years, the strangulation approach has never been more feasible.

There's lots of good youtube lectures and agile community blog posts on the topic of second systems and strategies to implement the strangulation. Here's an old classic that introduces the concept better than I could:

http://www.martinfowler.com/bliki/StranglerApplication.html