Hacker News new | ask | show | jobs
Fixing under-engineered code vs. fixing over-engineered code (github.com)
20 points by Dobiasd 1760 days ago
13 comments

This is the opposite of my experience.

Under engineered code tends to be simple, straightforward work with a low blast radius, such that "make one change and test" covers most cases.

Over engineered code tends to be more convoluted, with more fan in, more fan out and a large dependency graph. Changes become more like high pressure bomb squad work, where cutting the wrong wire blows up the whole project.

If I get a task to "make the button blue" I'd rather do it in a repo where I need to grep around a little than in one where I need to debug which button factory factory library is being pulled in and applied to the parent docker image of the one running.

If we're sharing experience with underengineered code, let me have a go.

At $BUSINESS we have a very successful marketplace that brings together buyers and sellers! We've recently IPOd. We get a lot of new items in and we have an internal page which is used by multiple full-time employees to approve new items, maintaining quality and defeating spammers. The code is written in HTML::Mason templates in Perl (which is basically pretending to be PHP, but for Perl).

The code makes a query to the database that joins the main `items` table with millions of rows with about 5 other tables. It does complicated locking logic in this query, and if this logic fails, the multiple full-time approvers cannot effectively coordinate their work, and the site cannot make money off new items. The HTML and the code of the loop are interspersed, and there are additional queries issued as you go through the loop. The code outputs JavaScript snippets to the page inside the loop which manipulate data structures incrementally.

> Under engineered code tends to be simple, straightforward work with a low blast radius

Hahahahaha this was a delicate multi-month project to split code and presentation within the existing codebase (outputting a single JSON blob instead of writing out incremental append operations to the page source), installing a separate locking system, performing a zero-downtime cutover to this new locking system, following up with a zero-downtime cutover to a backwards-compatible new subsystem, switching that system over to more-scalable queries — then finally developing a much more ergonomic but less acutely critical new frontend to improve productivity.

You did all that in a few months? That actually sounds like something that went quite smoothly. The time needed to engineer that kind of high-scale system at the outset is also going to enter the multi-month range, and with less background info to inform it. The phase-shifts of scaling are by no means easy, but they respond well to effortful, issue-by-issue grinding away at the problem.

The worst-case overengineered projects generally have organizational issues that preclude ever making concrete progress or result in a Juicero-style "why did you even make this" product. Those are the projects that burn people out.

LOL I'm very sorry that happened. None of this is scientific - over and under engineering in this thread are abstracted from actual results. I'm more than willing to admit that convoluting concerns between presentation and billing is nuclear waste grade underengineering.

For my example of overengineering, I was specifically working with a project where every web page on a public site was taking a couple seconds to load. A team had reimplemented npm using inheritance in docker. Because they had one library that just imported stuff (to populate the parent docker image), their webpack build was unable to distinguish between imported and unimported code, and was just packaging everything. 20 mb websites.

That feels like a really high class problem to have. You have a product that got you to IPO, and it hit the end of its lifespan and needed to be upgraded/rebuilt.
That sounds over-engineered to me. But just poorly over-engineered, the worst of both dimensions.
Your experience largely mirrors my own. At a previous employer, I had to make some changes to a process that was importing data from a vendor. Typical straight-forward ETL, right? Not even a lot of data, like 20-30 records daily.

The process? Load the data from CSV to JSON, ship the JSON off to Azure. Pull the data back from Azure, check to see if it's been processed, apply the change on premise. If failed, reschedule for later. Multiple processes, multiple scheduled jobs, on-call alerts, etc, etc. The whole thing could have been replaced with maybe 50 lines of Python. Instead, it was probably around 10k lines of C# and dependency on a 3rd party ETL tool. It was a fucking mess. Worse, yet, I wasn't allowed to fix it.

This reminds me a colleague creating a Python class like this

import os

class Bla(): def __init__(self):

        self.var1 = os.env["VARIABLE1"]

        self.var2 = os.env["VARIABLE2"]

        # ...

        self.varN = os.env["VARIABLEN"]

    def get_variables(self):
        return self.var1, self.var2, ..., self.varN
They took 2 months to write an insanely complcated code to just copy few objects from an S3 bucket to another... And because the Lambda was timing out, they set the timeout to 15 minutes, leading to our Lambda costs skyrocketing because the function was failing/retrying all the time for some objects.

I was allowed to fix the timeout issue to save cost but I was forbidden to fix the code itself because our manager said "Well, it works so let's move on".

6 months later, on a Tuesday morning, bored, I decided it was enough so I rewrote this bloody Lambda in ~90 lines with a proper handler, retries, logging statements, etc ... in a couple of hours.

What did my manager say? "Good work but you should have taught them rather than doing it yourself".

He is right though, gotta stop the problem at it's source.

Often however that is a far larger/difficult task and it is easier and better for ones sanity to just fix the things that affect you directly.

To be slightly pedantic, one might call this situation over-architected rather than being purely over-engineered. I would agree that over-architected solutions (regardless of the internal engineering quality) are definitely hard to alter due to the many disconnected disparate parts, not to mention any human/political boundaries that have grown up around the implementation.

Following along that train of thought, under-architected solutions are often great to update because you get to make logical cleavages that are informed by time spent in actual production use, giving you a much better basis for any decisions.

Amen. And the worst part? Can't even fix it easily, because they all depend on each other. Someone is getting the data in json format and depends on the filename being exactly YYYYMMDD in a specific directory of a specific server. Some horrible open source framework can only handle data at 200 requests/sec (and of course every row is a request) and you have a distributed asynchronous queue plus rate-limiter to make it work. Yada yada.
Your experience agrees with mine. It may be TEDIOUS, but it's generally easier to fix underengineering which (IMO) tends to be wide but not deep issues.

Overengineering has its hooks everywhere; even WITH tests, changing something changes everything.

I've come to realize that at least for the level of engineering I'm exposed to that boilerplate is not always bad, copy/paste is perfectly valid to a point (which for me is usually "2-3"), and DRY is a tool, not a design goal.

Over-engineered code has a lot of coupling where incidental equivalence may have been misapplied as fundamental sameness. This was discussed here on one of my favorite PLC programming blogs:

https://www.contactandcoil.com/automation/industrial-automat...

Copy and paste are great tools. Making all 6 buttons in a particular grid with the code:

    grid.AddNewButton(1, 1, "Thing 1", Color.White, Color.Black, onClick1());
    grid.AddNewButton(1, 2, "Thing 2", Color.White, Color.Black, onClick2());
    grid.AddNewButton(1, 3, "Thing 3", Color.White, Color.Black, onClick3());
    grid.AddNewButton(2, 1, "Thing 4", Color.White, Color.Black, onClick4());
    grid.AddNewButton(2, 2, "Thing 5", Color.White, Color.Black, onClick5());
    grid.AddNewButton(2, 3, "Thing 6", Color.White, Color.Black, onClick6());
keeps this incidental sameness in mind. Just looking at the above code causes programmers everywhere (myself included) to imagine ways in which the above could be done with a `for` loop, computing the row and column numbers from the index, using string concatenation for the labels, creating an array of onClick handlers and indexing into it... But that forces fundamental sameness where there may not be any; a little repetition doesn't hurt.
Foreground and background color should be moved out to avoid repetition, e.g. BUTTONG_FG, BUTTON_BG. All other parameters are different, so no repetition.
> If I get a task to "make the button blue" I'd rather do it in a repo where I need to grep around a little

Except it's not one button, it's 12 of them and they all use different UI elements with different names and syntax, and you can't change one of them because doing so would break some unrelated business critical class.

> Under engineered code tends to be simple, straightforward work with a low blast radius

Under-engineered code can easily become over-engineered code, while retaining the appearance of being the former, as engineers keep working on it over time. It only becomes easier over time, as bugs are uncovered by users and then "fixed", as features are requested by users and then bolted on, and so on.

> If I get a task to "make the button blue" I'd rather do it in a repo where I need to grep around a little than in ...

In the kinds of codebases I'm talking about, you will successfully end up making the button blue, but then, either:

1. sth else breaks and you wouldn't know about it until later; and/or

2. a bunch of seemingly unrelated tests break and you will need to debug which button provider is auto-injected into which tests' setup routines.

Bad code is not related to over engineering though.

Over engineered means that is better than the specifications, however such things being over specifications happen to be useless or unused. Then time and money was wasted, but the product is not worse by any mean.

So this article concludes that starting with a certain project size, over-engineered code is easier to work with than under-engineered code, and scales well when the projects grows (even "linear with the size").

But what this article is actually about is about entanglement of individual parts. Under-engineered implies spaghetti code with lots of entanglement. Over-engineered code does have the minimum needed amount of entanglement.

Of course this is right on the topic of entanglement.

But this is not so much my understanding of the terms over-engineered vs under-engineered.

I understand over-engineered as having too much abstractions. And those can turn out to be the wrong abstractions when they are actually used, which then requires more work to refactor this.

I understand under-engineered as having too little abstractions, and having build in some assumptions which might hold for one use case but not anymore for others. This can also require some effort to fix this.

In practice, this is also a continuum, and you might even have both things mixed.

And when actually writing some new code, it is often hard to know the right amount of complexity.

This is why the overall quality greatly improves when you do a couple of iterations of rewrites from scratch. Because you keep improving on just the right needed abstractions.

Related:

http://number-none.com/blow/blog/programming/2014/09/26/carm...

https://github.com/Droogans/unmaintainable-code

This misses the point that overengineering takes longer to do, so it adds costs to the initial development process. Unsurprisingly there’s less cost later - you’ve already paid a lot of it!
That seems like the best case scenario — the over-engineering is costly upfront and worth it later on. But if there’s a problem in the over-engineering, e.g. mistaken assumptions, fixing it is another round of expense.

So over-engineering is “expensive now and hopefully cheap later”, while under-engineering is “cheap now and maybe expensive later”. In one scenario you get 1-2 rounds of expensive. In the other you only get 0-1 rounds of expensive.

With that in mind, under-engineering the first implementation might be the sensible default choice.

You're absolutely right, of course. :) While this was not the point of this article, I just added a remark about this important fact nonetheless: https://github.com/Dobiasd/articles/commit/2a251204183cb45e7... Thanks!
This pretty much just comes down to estimate how much you're going to need and then go for that in the first place.

Which I imagine is what people would be aiming for anyway? I don't think anyone is using YAGNI knowing they are in fact going to need it.

The other issue is most engineers want to over-engineer. So anything that encourages that without noting the real costs should probably be tampered down.
> The other issue is most engineers want to over-engineer.

My experience is that everyone wants to engineer to their own standards. When it's in a team setting, people with different standards (and tastes) results into "over-engineered" code base.

Most "over-engineered" code is not like a puzzle. It's an ever-growing amorphous pile of playdough. Removing pieces of playdough from the middle of it very difficult.
The OP seems to be assuming the abstractions in the over-engineered code are all clean. If that's the case, the abstractions really can be like partially-solved portions of a jigsaw puzzle. It's a common, flawed conceit that you can get design clean abstractions up-front with pure engineering discipline/process, though.
This seems to miss the point, engineers need to communicate with business stakeholders to understand the business needs and where they are in product development. Then adjust your engineering efforts based on that.

The worst case is you delay learnings by over engineering a product, make things more complicated, bake in a lot of rules and assumptions that aren't true, and generally build a Ferrari the business needed was a go-kart.

Engineers tend to optimize for the engineering experience and miss the broader business experience. I love it when I have to go back and clean up technical debt that we leveraged to move faster, learn quickly, and find product market fit. That is a high class problem. I also love it when we have to scrap under-engineered products we built, because we learned a lot and need to change directions. What I hate is when we have to scrap an over built system that was costly to build, took a long time, and I burnt a lot of political capital to build.

If you are in a mature product, and the business is confident in the direction, absolutely spend the time to build out and think through all the abstractions and really build an enterprise grade solution. But that shouldn't be the default, and it seems like a lot of engineers default to it because it's "best practice" or it optimizes for reducing future engineering pain.

So much of this best practice discussion seems reactionary to me.

Very few rules are broadly applicable to me as a software engineer on an existing code base:

- achieve the goal with the simplest set of changes possible.

- make your code look like the code around it.

- information is liability. The less of it moves around, the simpler and safer the code. “DRY” is only useful because it encourages thinking about apis.

My only real rule that flows from this is to make sure a code base sets out good standards before making anyone else work on it.

If the over-engineering is in the wrong dimension (one that the project doesn't need), then the cost will be double: adding the over-engineering and then detangling from it.

The problem is there are too many dimensions a given project can be over-engineered if the future is uncertain. So even an educated guess has a good chance of being wrong.

Even if the guess is reasonable and is eventually true, like "we'll need to scale, so might as well prepare for 100x capacity now", it's still frequently a mistake to over engineer.

The architecture to support something that is not needed tends to introduce rigidity into the codebase, adding a tax to future changes in order to maintain those features.

This may be slightly off-topic, but it seems to me that sometimes, YAGNI is a judgement made by someone who doesn’t have the same <insert adjective here> grasp of the problem as you do.

Most recent case in point: a good friend recently became a work colleague and peer and we work extraordinarily well together, better than any previous working relationship of mine. In addition to the maturity and battle scars that come from ~25 and ~30 years of industry experience, and very different experience, I’d say the biggest reason for this is our very complementary work styles: he is a mix of top-down and bottom-up, write some code knowing it will be replaced later, while I am a come from the side and try to get the whole thing in my head and code incrementally across the whole stack person. (He recently called me lazy, which we both knew was a compliment, but future lazy, in that I will spend more time know to avoid a rewrite later.) We both know the value and cost of technical debt, and are willing to make it good enough for now when needed, leaving comments that link to issues for later, which is good.

Anyway, I’ve been the one mostly responsible for the workflow engine, while he has done the DAOs, APIs, and front end components. More than once I’ve added a seeming YAGNI, which we’ve discussed, and which I’ve defended with vague statements and hand waving. (We trust each other enough that we don’t need complete agreement.)

More than once I’ve come to a point where I’ve told myself that I really need X only to find that weeks/months earlier I added either a comment about X or a doX() stub or even a partial, notional implementation of X.

I am very logical and very defensive in my coding, but very intuitive in my understanding of how things hang together. It took me years, perhaps even decades to accept and trust that intuition.

Sometimes a YAGNI is just something you know you will need, but cannot yet articulate the why and the how of it.

(I have also had more than a few tf was I thinking moments with code that was solid and clean and somewhere between mildly wrong (e.g., edge case mishandled) and near-wholly-borked (as in, I don’t understand how this ever worked, oh, wait it mostly doesn’t and I/we got lucky), so it isn’t all smooth and clean.)

Where this comparison fails for me is that over vs under engineered is not the only dimension that is a factor in maintainability.

I have experienced very poorly engineered code (from a best practices standpoint) which easily would be considered over engineered, and very well engineered code (again from a best practices standpoint) which falls squarely in the under engineered bucket.

In my experience, the code that most closely follows best practices, be it either over or under engineered, will be easier to correct.

How to strike a balance between this? The sweet spot is somewhere inbetween.
Sometimes I've tried to leave comments and notes if I detect a good place for a seam (to use Michael Feather's term), usually around an encapsulating function where I say something like:

    // TODO: This is the simple version that assumes [our current assumption]    
    //If you need to handle multiple cases here, I suggest [X strategy].
    void doThing()
    {
      //existing code 
    }
The definition of over engineered and under engineered is so poor that the comparison cannot make any sense.

If code has lot of abstractions that are poorly designed, is that over engineered or under engineered? It seems like it should be over engineered but if we analyze it, it means an inexperienced engineer designed the system, or not enough time was provided, which suggests that it was under engineered.

Over engineered would be something being way better than the required specs, but all the better parts are unneeded and unused.

This article assumes the overengineered abstractions make sense, as if they were all part of the same puzzle. Sometimes the code is just an abstract mess.

In my experience is not so much about under or over engineering but about how the dependencies and the side-effects were handled.

Given the choice of working on 1 of 2 code bases that solve the same problem I will choose the one where the dependencies and the side-effects are more up front.

This seems kind of obvious though. Larger projects are more in need of the abstractions. You ARE gonna need it.
Generous to think that you would consider a project over engineered, but still rely on its internal abstractions.
The acronym YAGNI itself is flexible enough to handle this edge case accurately.

The GO4 were truly visionary.

GO4?
“Gang of Four”, the name commonly applied to the authors of the seminal book “Design Patterns”

https://en.m.wikipedia.org/wiki/Design_Patterns

Not sure but I suspect it would be the "gang of four", i.e., the authors of the canonical OO design pattern book.
Gang of 4 - 4 authors from the book