Hacker News new | ask | show | jobs
by BeetleB 1197 days ago
I've said it before and I'll say it again - I should encapsulate it as a law.

BeetleB's Law of DRY: Every article that complains about DRY will be a strawman argument.

The DRY acronym came from The Pragmatic Programmer, and almost every instance of DRY people complain about is not at all what is advocated in the book. There are different ways of interpreting what he wrote, but my version is: "If you have two separate requirements that are very similar, keep them separate in code. If your duplicated code is one requirement, then DRY it into one location in your code."

So this:

> Following “Don’t Repeat Yourself” might lead you to a function with four boolean flags, and a matrix of behaviours to carefully navigate when changing the code.

Is not DRY. In fact, having boolean arguments is almost a guarantee that you've violated DRY.

Another way to know you've violated DRY: If one requirement changes, do I need to add if conditions to the DRY'd function to ensure some other requirement doesn't break? If yes, you're in violation.

Never tie in multiple requirements into one function. Or rather, do it but don't call it an application of DRY.

13 comments

I personally agree with your point of view on what DRY is. But,I don't think that's what's being taught or talked about anymore. When people say DRY they usually mean an abstraction layer that merges two similar concepts into one, and the DRY you talk about is just standard operating procedures. Static analysis tools will ding you on DRY rules if you have code that looks pretty similar, urging you to refactor them to be the same. Etc..
Literally had to tune out a super senior employee this morning who was talking about this flavor of DRY.

Invoking the same function from two places in the code with vaguely similar parameters is lot repeating yourself, dude.

Sometimes I wonder if the amount of sanity in the industry is finite and adding more people just makes us all crazier.

Its a zero sum sanity game and the companies trade us around to trying to win the most of it
+100

Which is exactly why static analysis tools that force you to do something need to be shot. Static analysis tools that inform you about a possible duplicate are totally fine. Give me an option to disable that particular instance.

Co-incidentally, micro-services do away with such problems in many cases due to the fact that code is "separate" and thus analyzers and sticklers don't find the "duplicates" and you can write beautifully simple code. Unfortunately it has the opposite problem then of leading to things like this Netflix architecture https://res.infoq.com/presentations/netflix-chaos-microservi... but for something simple like a personal blog (yes I exaggerate - slightly)

In the end I think the only solution is to have the right people and stay small enough to keep the right culture. That probably goes against all your metrics and growth goals of the company of course.

The only term I’ve found that gets me any relief (and honestly, it’s not much) from this sort of people is “idiomatic”

You don’t need to factor out idioms. That’s lunacy.

Doesn’t matter what it originally meant, people take it to mean you shouldn’t have repeated code and that’s the DRY we all live with which results in bad abstractions. I don’t think it’s a straw man at all, unless you use your specific definition of DRY which isn’t very useful
People are welcome to co-opt the acronym and give it another meaning. The issue is that the original DRY is a damn good principle, and it is more important to give it a name and propagate that knowledge.

If all we do is rail against the "new" DRY and forget the original one, then we are at a net loss.

I'm with you - the violations of DRY I still see regularly are clear cases of copying and pasting exactly the same logic (or magic literal value) when there was no reason not to put it in a helper function or named constant that could be referred to in both places. A code review I did yesterday had that - there were 5 or 6 lines of code that, starting with a particular regex, did some data massaging. It was determined in some cases a different reg-ex was needed for a second pass over the data, and so the submitter had simply copied the 5-6 lines of code and just changed the reg-ex used. I see that sort of thing 20 or 30 times more often than code that gets itself into knots because of excessive abstraction trying to avoid code repetition.
Some people feel productive when they can type a bunch of code quickly. And the only way to do that is to get muscle memory for writing the same code over and over and over.
I think I can categorically state that virtually every DRY violation I've come across involved little more typing than Ctrl+C Ctrl+V.
I’ve dealt with a few bad coding patterns over the years that were not copy and paste.

Though there have been a couple memorable occasions where I had to completely remove a bad pattern from the code to get people to stop using it.

To be fair, in your example it sounds like the repetition is very local and easily recognised for what it is. Not ideal, but hardly a poster child for when DRY is impactful.

If the change was otherwise good, I would remark on the repetition as "here's how I would write it differently" and not "go back and fix it now".

The first time someone changes four of the cases the same way but misses the fifth, though. That sounds like a good time to refactor.

Just came across another slightly more interesting example. We have code that has to do essentially the same thing for three different document types: it loops through a passed-in list and for each item, adds an object to a document, then initialises that object with details from the item. The "logic" in all 3 cases is exactly the same, yet there were 3 different implementations, one for each document type. The only difference between the 3 is that the function you need to call to add the object is different for each document type, and the library we're using (the MS DocumentFormat.OpenXml library as it happens) doesn't provide an abstraction for just calling the same function regardless of document type. In fact, creating such an abstraction wouldn't be complex at all - if you look at the decompiled MS source, they've basically implemented the same function 3 times, but using an internal function not available to us. As it happens there was a bug in our initialisation code (missing null check!), but of course it was duplicated across all 3 functions. I basically had 3 options:

    a) make the same null check fix in all 3 versions, thus resulting in 3 even longer functions with the same logic
    b) collapse all 3 functions into one and provide an abstraction for the "adding object to document" part
    c) refactor the 3 functions to use the same helper function just to do the object initialisation
In the end I went with c) because it required the least amount of code, and the only genuine duplication really is the foreach statement. But arguably if MS had kept their library DRY in the first place we would never have ended up with so much code duplication.
This is a significantly better example! When non-DRYness leaks out over interfaces it becomes much more of a problem.
I love the principle but do we want to save the principle or save the acronym? At this point IMO it's a lost cause. I wish someone with a following would make a retronym of TIE or SYNC to express it.

I remember when The Pragmatic Program 20th Anniversary came out, the authors, in interviews and the new edition itself, described DRY as "the most misunderstood" concept in the book. If for 20 years that is the most misunderstood concept, then maybe the name is not the best.

I propose the following: DRBL - Don't Repeat Business Logic

It's funny, because "dribble" or "drool" is the opposite of being dry ;)

Although it doesn't have as fun of a pronunciation in English, DRBR is probably a better acronym? Don't Repeat Business Requirements

Well, DART (Don’t Assert Redundant Truths) might be a better name for the same principal, though its perhaps somewhat opaque when doing imperative rather than declarative (e.g., functional/logic/relational) programming, since people are less likely to consider the former to constitute “asserting truths” in the first place. But it does get more to the point that thing you want to avoid isn’t code that looks similar, or even which mechanically does the same thing, but code that represents the same facts.
I have a set that’s harder to corrupt, but still not bulletproof:

Source of truth, system of record.

Business decisions should have a source of truth. But the domain of things we want to duplicate or not duplicate is bigger than just the business rules.

If you are asking two questions, it’s okay to have two implementations. If you ask the same question twice, you should get the same answer, not just the same output.

But miscreants can twist the meaning of five different parts of what I just said. Like what even is a business rule? It’s whatever the last thing they said before you got them to stop talking. But if they come back later and want something that disagrees with what they already asked, they’ll wriggle like a fish on a river bank trying to gyrate a way to interpret what they asked for to say you’re wrong (and therefore you should work nights and weekends and we don’t owe you a raise).

The problem is with the entire concept of development "principles". They are a bad way to propagate knowledge. I suspect more people have an incorrect understanding of DRY than not. Seems like a net loss to me.

We should ditch these principles altogether and focus on teaching a deeper understanding of these concepts that captures the nuances.

I don't think you can move away from principles in general. The reality is that most SW design is subjective. Not reusing code is a generally good principle. It's just that the misapplication of DRY is following one good principle but violating another one (requirements should be decoupled).

In any case, the reason I go on the anti-rant rant each time is because when I use DRY appropriately, I don't want some idiot flagging me in a code review saying "Don't do this. DRY is bad. Here are N blog posts explaining why" - when none of the blog posts are complaining about what I am doing.

I don't think you can separate the principle from its misapplication. It's misapplied because it tries to stuff useful knowledge into a memorable phrase and the nuance is lost.

Flagging code in code review is another great example of harmful behaviour principles encourage. I've stopped referencing principles altogether in code review and I encourage others to do the same. Instead I focus on trying to explain the specific impact the code will have on our specific codebase.

> I don't think you can separate the principle from its misapplication.

In practice you are right, but this is just part of the human condition. There is no substitute for experience. Wisdom can’t be taught. The map is not the territory. Yada yada.

Principles still have value as short-hand for knowledgeable practitioners though. In fact, they have outsize value in this case because strong programmers will recognize and reflect on both the upsides as well as downsides discussed here. Communication bandwidth is the single most important thing to scale teams working on irreducibly complex domains.

I also sign up into thought school where “principles” went bankrupt. When I see someone quoting principle as a sole reason for code change it automatically shouts it is shallow explanation without much thought.
People are still coming up with best practices for accounting, a field with thousands of years of history. Principles are fine but not final
I don’t think it was co-opted, I think it never stuck. PP didn’t invent the idea of deduplication. I’m not even sure they invented DRY. But when that book was brand new there were already people misusing the idea of deduplication.
I feel exactly the same way about DI.

Structuring code so that abstractions don't depend upon implementation details is in my top 3 principles of all time (along with pure functions and good typing).

DI frameworks a la Spring and Guice just annoy me.

Yep. And then you get bloated constructors taking in a dozen arguments many of which are just needed to pass along to the parent class constructor, and - drum roll - the CI tools complain because your constructor is similar to one in another completely different class that just happens to need similar dependencies due to the abstractions and that's a copy-paste-detection fail.

So you refactor everything, factor out the constructor, and then it passes but now you need to add a new dependency so you're right back to the same nonsense. And/or you have tons of classes getting dependencies they don't even need, because some do so the parent has to have them all.

Traits can help some.

But the abstractions and DI that were supposed to make things easier still often make things more complicated.

DI has a place. In my opinion it's for plugin systems. If you have a core system that enables third parties to extend DI can be brilliant.
I think it's very useful
Yeah I'm pretty burnt out on the DRY articles. It's so easy to misrepresent something as an absolute and talk about how it is wrong. As for DRY, it also means they don't understand / or aren't honest about what DRY is about and I immediately am skeptical about the author.
> As for DRY, it also means they don't understand / or aren't honest about what DRY is about and I immediately am skeptical about the author.

Indeed - I didn't bother reading the rest of the article.

Your re-definition of DRY is essentially that when code does the "EXACT" same thing it should be pulled into one place. Which is not something people will disagree with.

But the actual DRY definition is a little more nuanced.

> Every piece of knowledge must have a single, unambiguous, authoritative representation within a system

And this is what OP is referring to. It's the little abstractions that become big abstractions in the name of DRY that can over complicate code bases.

When it comes to heuristics intention doesn't matter. If the end result of DRY is that most people over-apply it then it is a bad heuristic.

> Your re-definition of DRY is essentially that when code does the "EXACT" same thing it should be pulled into one place. Which is not something people will disagree with

Sorry, not sure how this is my redefinition, and I would not by default agree to this. If you have code that does the exact same thing, but they are for separate requirements (which does happen), then I would not recommend refactoring to one function.[1]

If they are for the same requirement, I would.

> > Every piece of knowledge must have a single, unambiguous, authoritative representation within a system

> And this is what OP is referring to.

Sorry, but my original comment is that this is not what the OP is referring to. If you abstract into something with a lot of booleans, chances are that function is now related to multiple pieces of knowledge.

[1] I may still do it, but with the understanding that I may need to undo it when one of the requirements changes.

I prefer to use the rule, single source of truth. If you have some business logic make sure you always call the same code to run that logic.

That way you're not looking for abstraction layers to do avoid lines of code. You are looking to make sure you don't have subtle bugs that only happen in some code paths. You also only have one place to update as the business rules change.

It's the same as dry but avoids confusion.

> The DRY acronym came from The Pragmatic Programmer

> almost every instance of DRY people complain about is not at all what is advocated in the book.

It's futile to bang the drum insisting that people follow the original meaning of what the creators intended. Look at what happened to Agile. Completely perverted of its original meaning. You can't argue the case for Agile anymore just by saying that the creators of the Agile Manifesto meant for it to be something completely different, because what other term can you use to describe the micromanaging process-heavy framework has currently taken its place?

DRY may have been introduced with a caveat to not conflate two pieces of code that are only incidentally similar, but people have completely disregarded it and used it to propagate code monstrosities. It's not a strawman argument, we need a term to describe situations where functions end up using four boolean flags.

I agree. The big gotcha with DRY is what counts a duplicate. To me it's duplicate _requirements_ not just code that happens to look similar at the moment.

I phrase it as "duplication is a tool". Which is to say if something is actually definitely for sure the same requirement, you can de-duplicate it to have the compiler/tooling enforce and uphold that design constraint. This is good!

Many "duplicates" are really only similar, temporarily, and by coincidence. In those cases, it's not really "duplication" and keeping it separate is almost certainly a better choice.

That's what usually happens when less experienced devs read upon DRY (among other classics that every dev has sometimes heard of as a "best practice") and try to adhere to this biblically paired with an anxious code review environment that code has to be "clean code" perfect. The early MVC-Pattern-Communities (Rails etc.) with their overusage of acronyms like DRY contributed their weight into misguiding new devs.
> BeetleB's Law of DRY: Every article that complains about DRY will be a strawman argument.

Counterpoint: Every article that defends DRY will be a No True Scotsman argument.

No True Scotsman is relevant only when one cannot objectively define a True Scotsman (i.e. not in a recursive manner).

Here we have an objective, original, unambiguous definition of DRY. So no, this is not a No True Scotsman fallacy.

I tend to implement my DRY with OOP (polymorphism).

That doesn't go over well, with today's crowd. Not considered "cool."

I don't see how you can throw away the modern definition of DRY and put critiques of it (the modern definition) to bed just by pointing out the historical origination. They're completely different topics and the discussions thus need to be insular.
They can't be meaningfully separated because they're using the same phrase with almost, but not exactly, the same meaning. One version having a sanity check (don't repeat the actual same logic/information, at least not excessively; usually it's paired with the "Rule of Three" which is three repetitions then look for a refactor) and the other not (don't repeat anything that happens to look alike and don't actually think, follow this rule like it's written in stone). If these aren't distinguished, then you end up with everyone talking past each other both thinking the other is an idiot (rightly from both perspectives).

People arguing against the latter are making a sane and reasonable argument. And people arguing for the former are making a sane and reasonable argument. But if, in the same discussion, both senses are meant without qualification or clarification then only confusion will be found.

Then start your critique of DRY by pointing out there are multiple definitions of it, and that you are referring to one definition.
If most people "straw man" your position then it's not because they're straw manning your position but because they misunderstand your point because it is not clear enough.
If your catchphrase is Don't Repeat Yourself and some people take it to mean they shouldn't repeat themselves, the fault is entirely with the catchphrase you used.

If a third set of people then take it upon themselves to tell everyone that Actually You Should Repeat Yourself Sometimes, this is not them attacking a strawman, it's an attempt to clear up the confusion caused by the phrase/acronym DRY.

The original definition of DRY was "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." That's not what Don't Repeat Yourself means, if read literally. Because DRY sounds like it applies to code instead of to knowledge, of course it's widely misinterpreted! If they'd called it the Fight Unnecessary Copying of Knowledge principle nobody would be having this argument and we'd all get to save ourselves a lot of time.

Yeah, that's a good one. I have found, however, that denormalized tables designed for the queries that you need to do can make a massive impact on performance, especially at scale.

However, it's also a massive pain in the Automated System Structure to keep them updated properly.