Hacker News new | ask | show | jobs
by xondono 1728 days ago
> Buried in here are great examples of why rewrites don’t help

That has not been my experience. Rewrites do sometimes help, because in a lot of codebases there’s too many “pet” modules or badly designed frozen interfaces.

Rewrites can help in those situations, because there’s no sacred cows anymore. The issue is that a lot of people do rewrites as translations, without touching structures.

6 comments

Agreed with this 100%.

So many posts here over the years of examples of 'how we rewrote from x to y and saw 2000% gains', where x and y are languages. Such examples are 100% meaningless. Rewrites from the ground up -should- always be way faster, since it's all greenfield. If trying to make a language comparison, rewrite the entire thing in both languages!

Yes absolutely. I wrote an article a couple months ago which was trending here where I got a 5000x performance improvement over an existing system. One of the changes I made was moving to rust, and some people seemed to think the takeaway was “rewriting the code in rust made it 5000x faster”. It wasn’t that. Automerge already had a rust version of their code which ran a benchmark in 5 minutes. Yjs does the same benchmark in less than 1 second in javascript.

Yjs is so fast because it makes better choices with its data structures. A recent PR in automerge-rs brought the same 5 minute test down to 2 seconds by changing the data structure it uses.

Rust/C/C++ give you more tools to write high performance code. But if you put everything on the heap with copies everywhere, your code won’t be necessarily any faster than it would in JS / python / ruby. And on the flip side, you can achieve very respectable performance in dynamic languages with a bit of care along the hot path.

Not only greenfield, but the problem domain is much better understood. A lot of architecture choices are made in the early days of a project when the problem isn't sufficiently understood to make the choice correctly.

I'm a huge fan of writing the first version of anything as an problem-exploration prototype, intended to be discarded and rewritten. As Fred Brooks said, "you're going to rewrite anyway, you might as well plan for it" [0]

[0] paraphrased from https://en.wikiquote.org/wiki/Fred_Brooks "The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow."

In my experience, the prototype never gets thrown away when it should be, and sometimes it's never thrown away at all. It just gets extended, poorly, until development grinds to a halt because you can no longer add features or fix bugs without creating new bugs.

Then you either a) stop what you're doing and spend many months rewriting, or b) spin up a parallel team that does the rewrite, while the old team maintains the old code and does their best to add the most critical features and fix the most critical bugs without breaking anything else in the process.

Neither approach is good. (a) means you'll probably lose customers due to lack of progress on their pet issues. (b) means your development costs have doubled, and you have a team full of people who are demotivated and demoralized because they know they're working on something that's soon destined for the junk heap.

I usually build the first version expecting that it will live on for quite a long time (and sometimes/often be the only version), and build with an eye toward ease of refactor and even ease of rearchitecting. Yes, it's slower than building a prototype-quality product, and yes, sometimes product managers complain that the extra time needed will blow a market opportunity. Those PMs are usually wrong, and even if they are potentially right, building the prototype always takes longer than expected, so the PMs end up fretting over time-to-market anyway.

This is where profiling helps more. Find the weak parts of the code, try to optimise those. If the language proves to be a barrier then you have a justification for a rewrite.

All too often people don’t understand how to performance tune software properly and instead blame other things first (eg garbage collection)

Most slow languages make escape to C easy for cases where the language is the issue. Most fast languages make writing a C APIed interface easy, so if the language is your issue just rewrite the parts where that is the problem.

Of course eventually you get to the point where enough of the code is in a fast language that writing everything in the fast language to avoid the pain of language interfaces is worth it.

And there’s time when even C isn’t sufficient and a developer needs to resort to inlined assembly. But most of the time the starting language (whatever that might be) is good enough. Even here, the issue wasn’t the language, it was the implementation. And even where the problem is the language, there will always be hot paths that need hardware performant code (be that CPU, memory, or sometimes other devices like disk IO) and there will be other parts in most programs that need to be optimised for developer performance.

Not everyone is writing sqlite or kernel development level software. Most software projects are a trade off of time vs purity.

That all said, backend web development is probably the edge case here. But even there, that’s only true if you’re trying to serve several thousand requests a second on a monolithic site in something like CGI/Perl. Then I’d argue there’s not point fixing any hot paths and just rewrite the entire thing. But even then, there’s still no need to jump straight to C, skipping Go, Java, C#, and countless others.

Except when the program is actually written in C, then better hold the Algorithms and Data Structures book and dust it off, or Intel/AMD/ARM/... manuals.
Algorithms and data structures come BEFORE dropping to c.

These days it is rare that you can beat your compiler with hand machine code, and even if you can it isn't worth it because the difference is typically small and only applies to one specific machine.

Of course once in C you can often think about memory locality and other cache factors that higher languages hide from you.

Many applications still start in C, there is no dropping into C.
Quite true, a rewrite can help if it is also a "rethink". But you don't have to switch languages to get that effect--in fact you'll probably do better if you don't throw a new language/library into the mix.

My point was that, contrary to what is apparently a common impulse, rewriting the same thing in a different language while maintaining the lack of attention to performance considerations that was present in the first version isn't going to help much.

This is less an argument for a rewrite than an argument for redesigning parts of your codebase, which can be done much more easily than a complete rewrite.
The tricky thing is that it’s easy to end up with a result that’s not far off. Some modules will improve, but a lot of the time these kind of bottlenecks tend to happen because the performant version is not very idiomatic (feels weird), it’s too verbose, or it’s to confusing to think through.

Unless you have the same team (and they learned the lesson the first time), it’s very likely to end up with modules that perform in a similar way.

Sometimes changing the language makes thinking about the problems easier.

I would argue that the rewrites help when the information architecture for the original code is proven to be wrong, and there is either no way to refactor the old code to the new model, or employee turnover has resulted in nobody having an emotional attachment to the old code.

That said, to slot in a new implementation you often have to make the external API very similar to the old one, which can complicate making the improvements you're after.

> there is either no way to refactor the old code to the new model

That doesn't happen. Write facades as needed. Even if they are slower than everything else write the facades so you can keep in production all along.

If you get the object ownership and the internal state model wrong (information architecture) facades don't help you.

You can't put an idempotent or pure functional wrapper around a design that isn't re-entrant and expect anything good to come from it. IF you get it to work, it'll be dog slow.

Last time I was in a rewrite the boss had the old software on a computer next to him with the label "Product owner of rewrite". He regularly when asked how to do something looked at what that did.