Hacker News new | ask | show | jobs
by spectre256 4111 days ago
Joel Spolsky has one of the most convincing discussions on this topic: http://www.joelonsoftware.com/articles/fog0000000069.html

"Back to that two page function. Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes. ...

Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it's like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work."

2 comments

It doesn't convince me. There's no one answer that's correct for all situations.

Sometimes, had it been written well from the beginning, there wouldn't have been as many bugs in the first place, though.

I've seen codebases that really shouldn't be redone, just maintained and slightly patched over time - the risks are too high, the maintenance is low, patches are at least understood (mostly). I've also seen other codebases that should be scrapped and restarted. It really depends on the skills, commitment and expectations of the parties involved, and there's no one answer that fits all situations.

This time, PHK wasn't able to claim that he recognized any serious issues, but still started from scratch, leaving out even the idea to support all that NTPd already has fully implemented. And he received money to find and fix the bugs, not to make one more proof of concept.
Wasn't specifically defending the PHK decision, more just against the Spolsky article. I've seen it quoted as gospel over the past decade, and ... it just doesn't hold true in all situations.
It depends on the initial quality of the code. Spolsky happens to work with very talented people who tend not to write bad code, so if something looks weird and you don't know why it's there it's easy to give someone the benefit of the doubt.

But if you don't work with talented/skilled/experienced people and there's "weird stuff" in the codebase it might well be for no good reason at all.

You can only invoke the Spolsky "it's bug-fixes!" argument if you're not cleaning up someone's horrific mess.

Exactly, but that nuance is lost on many people. I've lost count of how many people have quoted "don't restart from scratch!" and cited that Spolsky article as some sort of irrefutable wisdom of the ages.

Is it the boy scouts that promote "leave the campground cleaner than you found it"? That's the attitude I try to bring to projects, but there are limits, both time and effort. I can leave you (client, employer) a much better system (by whatever metrics you want to establish) by rebuilding from scratch when what I'm starting with is a broken, unstable mess. not always, but the idea shouldn't be dismissed out of hand because of something Joel Spolsky wrote about Netscape in 2000.

I used to work at a company that cached international shipping rates. Those rates are BASICALLY by country because most of the fuel is burned to move the package the gross distance from one country to another.

But every once in a while we'd end up paying double for a shipment because the customer was way out in the sticks or whatever. Had we done more dynamic stuff like taking the whole address into account when quoting a price to a customer we'd never get bit by this kind of problem. But it was only a couple of times a month so I didn't worry too much about it.

My replacement found that this bothered him a lot and he figured he'd score points by fixing this problem. So he did exactly that, transitioned the entire quote system from local database lookups to remote UPS/FedEx/USPS/etc calls. 2-4 rates per shipper (Ground, Air, etc) for a total of about 10-15 every time a customer wanted a quote. And because we would repackage stuff (it was a logistics company) we often never knew the exact weight so we'd quote 3-4 prices so people could get a feel for which choice was their best bet for the best rate without delaying everything by an extra day or two in order to get a hard quote.

We cached these rates by country and weight (up to 1000lbs) so between all the service offerings and whatnot it was about 100,000 pieces of information in our actual, but occasionally incorrect database. So there were two choices:

1. Don't do any caching and just look them all up in realtime for customers. They're web APIs so there's latency associated.

2. Cache, but on a per-address basis. We had an address book for our customers so we knew the couple of addresses they would want to ship to and we could aggressively warm the cache so that all the rates would already be there. But there were about 10k unique addresses in the database * 100k total rates = 1 billion rates that needed to be cached.

When I presented this back-of-the-envelope calculation in a meeting do you know how he blew all of it off?

"Premature optimization is the root of all evil" -- Donald Knuth

I was so flabbergasted that someone could be aggressively ignorant and yet somehow twist Knuth's words to support their own position that I simply gave up. I was dealing with a powerful stupidity and it was stronger than me.

I later heard that during the transition it was touch-and-go for about a week and they had to issue a lot of credit to pissed off customers. The rate quoting page went from about 20ms to render (and maybe 300ms to load) now to about 4 seconds.

The short (but perfect) JWZ text is here:

http://www.jwz.org/doc/cadt.html

"that's what happens when there is no incentive for people to do the parts of programming that aren't fun. Fixing bugs isn't fun; going through the bug list isn't fun; but rewriting everything from scratch is fun (because "this time it will be done right", ha ha) and so that's what happens, over and over again."

But the longer one, containing more or less the quote I first approximated, I just can't find. If I remeber he wrote about Netscape, the code for FTP and how long it took to get it right in all edge cases, and then it was thrown away.

This is written in 2003: I do not think it is a coincidence that Kent Beck is credited ( http://en.wikipedia.org/wiki/Test-driven_development ) with re/discovering Test Driven Development in that year.

TDD can go horribly wrong, too -- but if your tests are on high-enough-level functionality, and you maintain them, then they can encompass the lessons you learn from fucking up every time you do it. Writing & Maintaining tests isn't fun, just as fixing bugs isn't fun -- so in spirit these kinds of arguments hold just as much sway: there are fun parts of programming, and there are parts of programming that are significantly less fun, but even if you fork and start from scratch, it is feasible to have a checklist of bugs/features developed in the order that they are developed initially.

Do most open source projects do this, even the more well-maintained? Of course not. However, if people are seriously worried about this phenomenon, that's probably one of the ways developed in the decade since this essay was published to approach it.

I hardly think PHK is ADT at all. There are often good engineering reasons to rewrite software. I think Joel's point is that it is more often undertaken for wrong reasons.
There's the software that works, he gets the money to find the bugs and fix them, he decides to write from the scratch something that certainly isn't the replacement of the existing software except for some specific users. That is very ADT, exactly to JWZ's definition:

"This is, I think, the most common way for my bug reports to open source software projects to ever become closed. I report bugs; they go unread for a year, sometimes two; and then (surprise!) that module is rewritten from scratch -- and the new maintainer can't be bothered to check whether his new version has actually solved any of the known problems that existed in the previous version."

http://www.jwz.org/doc/cadt.html

He got the money to look for the bugs and to fix them (hard). He instead goes to make the fully new undiscoverd bugs in fully new (his own) code (easy).