Hacker News new | ask | show | jobs
by tharkun__ 1430 days ago
The thing is that if you just need to understand a specific part of something you will need to jump as well even if everything you needed for that one thing is written sequentially in one file. You will want to skip over implentation details of certain things to get the general picture first on a more abstract level.

Let's say you have a simple endpoint that takes a list of comma separated inputs, parses them as numbers and spits back a sorted version of that.

I don't want to see a version of that, which a compiler might have inlined. Including the implementation of the sorting algorithm. I only want to see a high level of abstraction version of it. Basically just something like (pseudo code in a non existent language) :

    fun endpoint(input):
      inputs[] = split(input, ',')
      numbers[] = parseAsIntegers(inputs)
      return quicksort(numbers)
I can easily understand what this does and what the idea is behind this "algorithm" in 3 lines. If I had the "inlined" version of this I would have to manually identify each of these parts and potentially skip over tens to hundreds of lines.

I think this is really a bit about trust. Do you trust that these named functions I am calling do what their name does? Does quicksort actually do a quicksort or has someone implemented bubble sort in there? Of course this is a minimal example and especially quicksort would probavly just be a library but imagine all of these were large complex pieces of our code base.

Personally I am an advocate for using functions (methods or whatever your language calls them etc) and naming them properly and then trusting those names by default. I want to spend time making this nice and understandable and abstracted once when writing. Not every time someone reads it. As soon as something does not seem to behave in the way the name suggests I will then and only then go check the actual implementation and for example find out that parseAsIntegers actually also supports floats and quicksort is not actually quicksort but bubblesort and that is why this endpoint was slow etc.

2 comments

This would be the “simple” code. The “abstract” code would be more like this:

  public class EndpointManager {
    private EndpointInputManager eim;
    private StringSplitter splitter;
    private NumberParser parser;
    private Sorter sorter;
    public EndpointManager(EndpointInput input) {
      eim = new EndpointInputManagerFactory().setInput(input).build();
      splitter = new StringSplitterFactory().setDelimiter(new Delimiter(",")).build();
      parser = new NumberParserFactory().setFormat(NumberParserFormat.INTEGER).setMode(NumberParserMode.LIST).build();
      sorter = new SorterFactory().setSortOrder(SortOrder.ASCENDING).setAlgorithm(SortingAlgorithm.QUICK_SORT).build();
    }
    public EndpointOutput endpoint() throws ParseException {
      splitter.split(eim.getInput());
      parser.parse(splitter.getList());
      sorter.sort(parser.getOutput());
      return new EndpointOutputFactory().setOutput(sorter.getSortedList()).build();
    }
  }
My guess is this is Java or something close? We can make that much more readable. We may have to do away with bad libraries. A lot more could actually be magicked away, which can also be a problem sometimes. In a "real" application this resource's interface would probably not just take a comma separated string in a body but accept a proper JSON object or somesuch and not just be a "sort endpoint" but it's not going to be much different from this if written properly. I happen to like the few annotations you'll see me use here. I also think that something as simple as a line break can unclog things. Also the choice of having each of the stream operations in a separate line is deliberate for readability. There are linter/auto formatting rules to enforce this (we do this at my current place for example).

    @Path("/sort")
    public class SortResource {

        @GET
        public List sort(@Body String input) {
            validate(input);
            return Arrays.stream(input.split(","))
                .map(Integer::valueOf)
                .sort()
                .collect(Collectors.toList());
        }
    }
I do recognize the kind of code you pasted. Had to work in code bases like that for way too long. Never want to work in one of those again. There's probably lots of EJBs and other such nonsense around that?
I was going to make the predictable joke that you need a factory. Disappointing that you took care of it already.
You can easily understand what it does because you picked an example that is easy to understand :)

This discussion often ends up in the extremes: inline everything versus abstract everything. I don't think anybody reasonable would opt for either of those extremes, we should focus on the very large middle ground where there's a lot of subjectivity.

Trust me on this, I've been raised on the DRY dogma and all related architectural patterns in favor of abstraction. I've lived the life, for 2 decades. But I cannot ignore the outcomes. Most codebases are extremely difficult to understand and it's very painful to change things. As 90% of all software development is maintenance, that's a planetary-sized problem.

This doesn't mean you should inline everything, it means sane choices. As a simple example, say you're using a literal in your code:

(if orderAmount > 100000)

This code is incredibly easy to read. Common convention says to put this in a constant, at the top of the file. Old me agrees, new me does not. For as long as it's the only occurrence of the value, it doesn't need abstraction. The only thing it would do is make the code more difficult to understand. The very eager abstracter might even put that constant in a separate file.

The point of this example is to abstract based on real reuse, not imagined reuse. I'm not against abstraction only against unnecessary abstraction.

A second example. Say you have a reusable UI component. A change request comes in that's pretty large and specific for one niche need. It kind of goes against the spirit of the initial purpose of the component but is still related enough to consider it in scope of the component.

Old me might add a "toggle" to the component, after which it can render in two modes. Sometimes called a "god component". This approach sucks. It makes the component much more complicated and changing and testing it becomes a nightmare.

Even older me would break down the component into smaller components and then "compose" them based on its mode. This is even worse, now you have to jump around many places whilst the sub components are never actually reused (pointless abstraction).

New me says fuck it and splits the component in two. Allowing significant code duplication between both components. It's not as radical as it sounds, it's in fact incredibly comforting. Each component can easily be understood (less complexity) and making changes becomes far less stressful as your blast radius is tiny.

Developers spent the vast majority of their time not coding, instead figuring out how something works and how to make a change that doesn't break anything.

It may not have been evident from my simple example but I do agree with your "middle" approach. That's where I try to end up in our code base. Endless interfaces, methods that are only one line long and such are counter productive. But nobody can tell me that inlining quicksort will ever be useful outside of a place where your compiler can't do it for you and you need to favour execution speed over everything else. I don't believe such places really exist much if at all any longer.

What I do have to have to question is the strict non-use of constants. It can be very very useful to use constants for such things, e.g. if you are calling libraries that do not make it apparent what is what. Say you have something that takes a timeout value.

    send(data, 10)
What is this? I have to know what send is, what parameters it takes etc. I might have to look that up. I can easily work around that with a constant.

    const timeoutInMillis = 10
    send(data, timeoutInMillis)
The same principle can easily apply for other similar situations. I really like it for things like

    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, 64800000)
What is that and what does that value even mean in human readable? Of course some of these values you will recognize if used enough but so far my domains have been sparse enough that I don't recognize all of them and have to compute. Much better (with shorter, real names anyway but ya know, we're dealing in simple examples here :) ):

    REALLY_LONG_RUNNING_PROCESS_TIMEOUT_IN_MILLIS = 1000 * 60 * 60 * 18
    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, REALLY_LONG_RUNNING_PROCESS_TIMEOUT_IN_MILLIS)
So far most people I've talked to find that it's much easier to recognize that this has a timeout of 18 hours but the method happens to want milliseconds.

Oh and don't get me started on people that use the constants from the real code in their tests, completely defeating the testing. Especially if they then do math with the constants and simply copy the math - or worse, put the math into a method and call it from the tests too - to their tests. Test expectations have to be computed once, when writing the test and just hardcoded into them, otherwise they serve no purpose as changing the code itself will always result in green tests even if you've just made a major mistake by changing the values without thinking.