Hacker News new | ask | show | jobs
by SanderNL 1053 days ago
I have swung both ways and I think I now settle somewhere near "boring is good" and "repetition is harmless (compared to the astronomic costs of wrong abstraction)".

Especially repetition seems to be hated with the might of a thousand suns and while I get it, because I myself hated it, I now can see the beauty of it.

What is currently a superficial repetition - a bunch of endpoint handlers, some forms - will often turn out to be similar-looking instances of completely different problems. Repetition is quite often deeper than just "code looks the same".

It takes a while to get to know these nuances.

11 comments

I've seen my share of cases where the repetition was only because the original author didn't know how to abstract the problem say, using a simple struct. Or even easier, a simple function.

I've sometimes taken on factoring the repeated code using common abstractions, and when that was done it often turned out that there was much more to factor out. And that the code quality could massively be improved because there are often some tricky situations that can only be properly handled when everything goes through a single place with all the necessary context.

As an example, we had a config file with an ini-like structure that was parsed by some central parser code. So far, so good, but after the parse, the resulting key-value pairs were processed like this:

    if (string_is_equal(config_key, "foo_setting"))
        config->foo = parse_number(config_value);
    else if (string_is_equal(config_key, "bar_setting"))
        config->bar = parse_string(config_value);
Repeating this pattern dozens of times is decidedly not cool. It lacks expression of the common structure, it imposes and prevents code optimizations and fixes (such as error handling).

Sometimes I was told that abstraction isn't worth the hassle and boring and repetitive is good. By factoring the code allegedly we risk breaking it and make it harder to fix. Those people didn't see how broken and incomplete the code is precisely because there is a lack of abstraction. We can't even know that there isn't some "quux_setting" with broken handling code because of course there aren't proper tests for each config setting or combination of settings.

If it can be abstracted, often the abstraction can have unit tests. That's my goal, it is not just about avoiding boilerplate and repetition, it is that boilerplate and repetition are more surface area for errors.
I hate repetition because it's nearly always laziness - it takes less thought/time to copy and paste a few lines of code than it does to factor them out into a reusable function and decide where to put it (and with what name). I'm taking about scenarios where the business logic needs to be exactly the same in both cases, there just happens to multiple ways to reach that point.

On the other hand I also hate having to deal with shared functions that have been repeatedly adapted and extended to be able to deal with all the various edge cases to the point they have 20 cryptically named parameters and no reasonable way of guessing what the output should be for given set of inputs.

But I'd still say more of my time is used up dealing with problems caused by lazy copying & pasting than by shared functions becoming overcomplicated or buried under excessive layers of abstraction.

> I'm taking about scenarios where the business logic needs to be exactly the same in both cases, there just happens to multiple ways to reach that point.

I've seen lots of these cases turn out to be "the business logic happens to be exactly the same in both cases".

It might have been a single feature at one point where it should have been identical, but two flows going there in two different ways means it's serving two masters. Often it will diverge as the product grows and those flows become their own features with differing requirements.

---

And also - "it takes less thought/time to copy and paste a few lines of code than it does to factor them out into a reusable function and decide where to put it (and with what name)."

Yes, that's the point. It takes both less time and thought, and it often diverges anyways. You are wasting time and creating complexity.

But the divergence is more often than not unintended and causes inconsistent/ unexpected behaviour when a change is made in one copy and not the other, which is why it ends up using up more time in the end. And how is extracting a few lines of code into a function and calling that more complexity than having two identical copies of it?
When the divergence comes from subtle differences in requirements, then those subtle differences in requirements now need to baked into your single function (or, really, broken out from the function; but they have to be recognized as different to begin with). Now, the next time you need to address a new feature along one pathway, you must also be certain that you are not subtly breaking some completely unrelated feature requirement.
If the function grows like you describe it's because the developers are doing bad work. Instead of extending the function into a monster it should be split appropriately according to the new requirements. In some cases it may end up being multiple classes and that's fine. What isn't fine is cramming multiple classes worth of complexity into one function just because it almost did what you needed.
> If the function grows like you describe it's because the developers are doing bad work.

Exactly. Evolving a nice function into one that accepts lots of arguments is a product of the same mindset that copy-pastes code.

Indeed...laziness in both cases! I certainly admit I've been guilty of "ooh this function already exists to do this, except in this case I need to tweak it slighty, so I'll just add another parameter" - which is usually OK the first or second time: the issue is when it sets in motion a pattern of behaviour that other devs keep following without stopping to question "has this function grown too complicated", or worse "I know this needs refactoring, but we need this bug fix in now, I'll create a tech debt ticket and come back to it later", but of course never do.
Laziness is good though. If repetition requires less work for the same outcome, that's good. If abstraction or automation of some kind (like codegen) requires less work, then that's good.

But the question is, "less work over what time scale?". Repetition usually requires less work over short time scales but often requires more work over longer ones. But not always! I see people abstracting and automating things in throwaway scripts, tools, and PoCs. That is a waste of time.

There is a series of xkcd comics about this, which are all spot on: https://xkcd.com/974/, https://xkcd.com/1319/, https://xkcd.com/1205/.

I refer back to that table in 1205 pretty often :)

> Repetition usually requires less work over short time scales but often requires more work over longer ones

Yes, that's been exactly my experience, and not by a small amount either.

Yes definitely. But the point is that you have to ask yourself what timescale matters. It isn't always "optimize for the long term" and it isn't always "optimize for the short term". It really depends on what you're up to.
There are also valid use cases for preferring boring repetition.

Most of the time when I do any kind of coding work, it's as a single person project for companies/orgs with absolutely zero tech people. For instance, the last project I took on was for a bra shop of 3 people. No IT. Nobody on staff who has any idea how anything tech works - the closest is the person who taught herself the basics of using Shopify + advertising platforms.

Which means that in the future if they need somebody to look at or update anything I've done, I can't assume they're going to have access to a well-trained, talented person - it's equally likely they'd go "hey so and so's kid does 'computer stuff' let's see if they can fix it for us." Knowing that the next person who looks at my work might have very basic level skills leads me to prefer being repetitive and 'simple' over the more efficient solutions I can think of.

It's the difference between a regular Wikipedia article and the Simple English Wikipedia - if I'm writing something that is likely to be maintained by beginners or apprentices, making it easy to understand and work with matters a lot more.

I have what I call the "10 second rule". The rule is that an experienced programmer (ie. someone who has written the type of code your codebase is written in, whether Python, JS, etc) should be able to look at a code snippet, any code snippet in your code, and figure out what it does in about 10 seconds. There are obviously exceptions to this where complexity can't be avoided but overall I found the tradeoff is worth it especially on a team of multiple devs.

I currently work for a client that has a very very large react native app that is pretty hard to wrap your head around. I can't tell you how many millions of dollars they have wasted in dev hours because it takes a dev way to long to even figure out what the code is doing before they can start making changes.

I always try to imagine that I'm coding for other people rather than coding for myself. This helps me to remember to make the code as easy to understand as possible.
Mine is the "10th grade rule".

A 10th grader should be able to understand it.

I like your "10 second rule"! I have a slightly modified version, the "look, a fly" rule. If I can lose focus for a bit and return to the function without being confused, it's probably good!
Always a continuum. Repetition is simple and good until it sucks :) Intuiting the right balance is an important part of the craft.
I thought about this a while and concluded that the right question is whether the repetitions are intended to do the same thing. If they are, deduplicate them. If not, leave them alone to evolve independently.
I feel the same way. Repetition is fine when you're working within a solid, well established framework. If you're writing 100s of GET endpoints, and most of them requires you to prefix your endpoints withs some @GET and @RequriesToken decorators, so be it.

If there are other forms of repetition within the user-code, there are ways of dealing with it. But writing a super class to solve superficial code repetition is most often the wrong way to go.

I've come to the same conclusion as you. Code is typically read more times than it is written/modified, so it is better to optimise for the common case, i.e. code that is tedious to write but simple to read and follow.
Repetition can definitely cause big issues.

Most of time you want business logic in the same place. In a big & old codebase it can be very hard to fix things if there are multiple places.

But certain things can be easily repeated.

The beauty of repetition? Wow... I must be out of the loop
My rules is never copy paste. If it's worth repeating, it's worth typing.