Hacker News new | ask | show | jobs
by MichaelGlass 1449 days ago
Agreed. To use the example from the article

`make_pizza(["pepperoni"])`

What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.

I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders

7 comments

> What does `make_pizza()` do? It could be a lot or it could be a little. It could have side-effects or not. Now I have to read another function to understand it, rather than easily skimming the ~four lines of code that I would have to repeat.

This is not a problem of DRY. This is a problem of wrong abstraction and naming. If the function is just four lines, it could easily be named `make_and_cook_pizza`. In the alternative scenario where those four lines are copy pasted all over the place, one is never sure if they are exactly the same or have little tweaks in one instance or the other. Therefore, one has to be careful of the details, which is much harder than navigating to function definition, because in this case you cannot navigate to other instances of the code.

Exactly this. I fixed a problem like this a week ago. I found some duplicated code, factored it out into one place by introducing an abstract base class (Python) and in the process discovered one of the duplicated methods had a logic error leading to returning a slightly smaller integer result.

The code had test coverage, but the test confirmed that it produced the wrong result. I had to fix the test too.

So your refactor broke the tests, so you assumed the tests must be wrong.
So his refactor fixed a bug and broke a test which he fixed, so you assume he must have assumed instead of verified.
in a sense yes, in a sense no. if you see a function and know its sort of black box properties and its inputs and outputs are well defined, you really don't need to care. however, that applies whether the code is in an external function/module or physically inlined into your code. the sectioning off into separate code is then there to forcefully tell the reader "don't even try to care about the implementation details of this", so in practice your point still applies.

however... real software doesn't work like this. the abstractions that work that way exist for a select few very well understood problems where a consensus has developed long before you're looking at any code.

math libraries would be a typical example. you really don't need to know how two matrices are multiplied if you know the sort of black box properties of a matrix.

but the minute functions, classes, and other ways of abstraction code in a DRY way that you encounter constantly in everyday code, even if they are functionally actually well abstracted (meaning it does an isolated job and its inputs and outputs are well defined), even for simple problems, are typically complex enough that learning their abstract properties can be the same level of difficulty and time investment as learning the implementation itself. on top of practical factors like lack of documentation.

this is also why DRYness as a complicating factor really doesn't factor in once the abstracted code does something so complex that there is no way you could even attempt to understand it in a reasonable amount of time. like implementing a complex algorithm, or simply just doing something that touches too many lines of code. in this case you are left to study the abstract properties of that function or module anyways.

I think that drawing conclusions from these examples is not productive at all. In the wild we're going to see functions such as

    def make_string_filename(s):
        # four lines of regex and replace magic
so that we have code like

    file_src = make_string_filename(object_name)
    file_dst = make_string_filename(object_name_2)
which is much more understandable than eight lines of regex magic where you don't even know what the regex is doing.

The problem of not knowing what it does or whether it has side effects or not is more a problem of naming and documentation than DRY. Even then, it's still better than repeating the code all over, simply because when you read and understand the function once, you don't need to go back. On the other hand, if the code is all over, you need to read it again to recognize it's the same piece of code.

if those 8 lines of regex have been unit tested and the function is commented to describe "what" the code does, it is entirely the point that you don't need to understand how it works

additionally, the function should be stateless and have no side effects ;)

How do you test 8 lines of regex inside a function that does more things? And what's easier, to write and read the function name or copy-paste the lines with the comment (if the comment explaining what that piece does is even written, that is)?
i'm a bit confused, do we need to know the nitty gritty details of how Math.random() is written, or that it will reliably give us a random double?
you dont you mock the function that have the regex with a fixed behavior and check the actual logic inside the wrapper function
That's fair. But maybe someone wants to reuse this in another place so they do this:

``` def make_string_filename(s, style="new"): # 2 lines of shared magic if style == "old" # 2 lines of original magic elif style == "new": # different 2 lines of magic ```

When you get here, two totally separate `make_string_filenames()`, each private to the area of code they're relevant to, would be better.

The ideal is having 3 functions I think:

- make_string_filename_style1

- make_string_filename_style2

- make_string_filename

Then make_string_filename consists of logic to use the right style.

Or one function and a Sum type to be called like:

    makeStringFilename Style1 "somestring"
Given sum type:

    data FilenameStringStyles = Style1 | Style2
Except that there should be only one way to make a filename from a string. Maybe some options like "allow_spaces" if needed but the point of DRY is not only to share code but to share algorithms.
Yup. But I guess that typically happens in steps. So next DRY-programmer that comes along will add a cheezeFilledCrust boolean to that make_pizza function and so on. Every time it will seem more reasonable to add another boolean, because otherwise you have to remove the make_pizza function, and there would be SO MUCH CODE DUPLICATION.

I’ve seen this again and again in the field and I wholeheartedly agree with the sentiment in the OP. IMHO different code paths should only share code if there is good reason to believe that the code will be identical forever.

Now the next genius turns up and says that make pizza is at it's core always a n-step domain process.

So now you've dumped it down to an interface with a default implementation which calls the create_dough, add_toppings, bake_pizza interfaces in order, each of which are either passed in callbacks or discovered through reflection.

We can even sprinkle in some custom DSL to "abstract away" common step like putting the product into the oven correctly!

Jr's will never understand when why and what is effectively excecuted at runtime. Honestly, at this point I enjoy working with this kind of code. It's always such a high entertainment value and I get paid by the hour, so whatever

This is discussed in detail in "The Wrong Abstraction" by Sandi Metz

https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

Quote follows:

----

The strength of the reaction made me realize just how widespread and intractable the "wrong abstraction" problem is. I started asking questions and came to see the following pattern:

1. Programmer A sees duplication.

2. Programmer A extracts duplication and gives it a name. This creates a new abstraction. It could be a new method, or perhaps even a new class.

3. Programmer A replaces the duplication with the new abstraction. Ah, the code is perfect. Programmer A trots happily away.

4. Time passes.

5. A new requirement appears for which the current abstraction is almost perfect.

6. Programmer B gets tasked to implement this requirement. Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter. What was once a universal abstraction now behaves differently for different cases.

7. Another new requirement arrives. Programmer X. Another additional parameter. Another new conditional. Loop until code becomes incomprehensible.

8. You appear in the story about here, and your life takes a dramatic turn for the worse.

Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."

Not really a sunk cost fallacy. Existing code needs to be maintained. Some of it should be deleted since it costs more to maintain. Some of it shouldn’t be deleted since it might bite you in the behind when you realize that all of that code was correct (although gnarly) and now you’ve introduced regressions. And which code is which? Hard to say.

Sunk cost (fallacy) is about making decisions based on things that you have already lost. But you haven’t lost or expended the code—the code is right there, and it’s hard to know if it’s more of an asset or a burden.

Some languages handle massive parameter lists better than other (ex with defaults). There are also design patterns for this type of problem (ex a PizzaBuilder).
100% agree and I just wrote this response and then saw you said it better!
But that is usually a problem with abstraction, rather than a problem with a method call. If I can trust what make_pizza does, that is much faster to read than any four lines of code.

A functional style certainly helps. I get the pizza in my hand and don’t have to worry that anyone left the oven on.

> If I can trust what make_pizza does

You can't, unless it's in a standard library or a core dependency used by millions of people.

That's one of the reasons why functional code is generally easier to read. A lambda defined a few lines above whatever you're reading gives you the implementation details right there while still abstracting away duplicate code. It's the best of both worlds. People who's idea of "functional programming" is to import 30 external functions into a file and compose them into an abstract algorithm somewhere other than where they're defined write code that's just as shitty and unreadable as most Java code.

> If I can trust what make_pizza does

>> You can't, unless it's in a standard library or a core dependency used by millions of people.

You can if you have reasonably competent colleagues. And if you do make some wrong assumptions about what a certain method does, it should be caught by your tests.

I feel that people that insist on reading and understanding all the code, and write code that has to be read fully to be possible to understand what is does, have missed something quite fundamental about software development.

Thanks - I like this point. I think it's probably a better illustration of what I'm trying to say in my third point. Devs are biased towards adapting existing shared code so we end up with shared libraries picking up little implementation details from each of their consumers and ultimately becoming very messy.
Arguably one could say that this is a typing (as in type system) problem

`makePizza :: PizzaType -> [Toping] -> IO (Pizza)`

Seems to carry all that information by just accepting a PizzaType symbol and a list of toppings, `IO` communicating the side effect.

> I think the article fails to show particularly problematic examples of DRY. E.g. merging two ~similar functions and adding a conditional for the non-shared codepaths. shudders

Not a problem of DRY, but bad code structure.

Just keep the two functions and pull the shared code-path out

Not all the time. When the similar code mixes types and the common codepaths are sprinkled multiple times over it you can either have the code there twice, or have an overcomplicated templated common function.

In these cases factorizing may or may not be a good idea.

I think it's just that for every complex topic, any general rule will break down at some point. That doesn't tell you that the rule is bad, but to learn how to tell when you're dealing with such an exception.