Hacker News new | ask | show | jobs
by viktree 1853 days ago
> How does splitting code into multiple functions suddenly change the order of the code?

Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed

  01. function positiveInt max(positiveInt[] values)
  02.   positiveInt max = 0
  03.   for (positiveInt i=0; i < values.length; i++) 
  04.     max = values[i] > max ? values[i] : max
  05.   return max

  07. function positiveInt total(positiveInt[] values)
  08.   positiveInt total = 0
  09.   for (positiveInt i=0; i < values.length; i++) 
  10.     total = total + values[i]
  11.   return total

  12. function positiveInt calcMeaningOfLife(positiveInt[] values)
  13.   return total(values) - max(values)

Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.

  01. function positiveInt calcMeaningOfLife(positiveInt[] values)
  02.   positiveInt total = 0
  03.   positiveInt max = 0
  04.   for (positiveInt i=0; i < values.length; i++) 
  05.     total = total + values[i]
  06.     max = values[i] > max ? values[i] : max
  07.   return total - max
> Better? No?

In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.

However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.

If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.

Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).

> And sometimes it does not even make sense to keep this order.

Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically

1 comments

Well, inlining by the compiler would be expected but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

Splitting the code into smaller functions does not automatically warrant a better design, it is just one heuristic.

A naive implementation of the principle could perhaps have found a less optimal solution

  function positiveInt max(positiveInt value1, positiveInt value2)
    return value1 > value2 ? value1 : value2

  function positiveInt total(positiveInt value1, positiveInt value2)
    return value1 + value2 

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++)
      total = total(total, values[i])
      max = max(max, values[i])
    return total - max
Now this is a trivial example but we can imagine that instead of max and total we have some more complex calculations or even calls to some external system (a database, API etc).

When faced with a bug, I would certainly prefer the refactoring in the GP comment than one here (or the initial implementation).

I think that when inlining feels strictly necessary then there has been problem with boundary definition but I agree that being able to view one single execution path inlined can help to understand the implementation.

I completely agree that naming and abstractions are perhaps two most complicated problems.

> but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

That's the thing, isn't it? Various arguments have been raised all across this thread, so I just want to put a spotlight on this principle, and say:

Myself, based on my prior experience, find code with few larger functions much more readable than the one with lots of small functions. In fact, I'd like a tool that could perform the inlining described by the GP for me, whenever I'm working in a codebase that follows the "lots of tiny functions" pattern.

Perhaps this is how my brain is wired, but when I try to understand unfamiliar code, the first thing I want to know is what it actually does, step by step, at low level, and only then, how these actions are structured into helpful abstractions. I need to see the lower levels before I'm comfortable with the higher ones. That's probably why I sometimes use step-by-step debugging as an aid to understanding the code...

>the first thing I want to know is what it actually does, step by step, at low level

I feel like we might be touching on some core differences between the top-down guys and the bottom-up guys. When I read low level code, what I'm trying to do is figure out what this code accomplishes, distinct from "what it's doing". Once I figure it out and can sum up its purpose in a short slogan, I mentally paper over that section with the slogan. Essentially I am reconstructing the higher level narrative from the low level code.

And this is precisely why I advocate for more abstractions with names that describe its behavior; if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code. I wonder if how the bottom-up guys understand code is substantially different? Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code, or does your mental model mostly hold on to the low level detail even when reasoning about the high level?

> Does your mental model of code resolve to "purposeful slogans" as stand-ins for low level code,

It does!

> or does your mental model mostly hold on to the low level detail even when reasoning about the high level?

It does too!

What I mean is, I do what you described in your first paragraph - trying to see happens at the low level, and build up some abstractions/narrative to paper it over. However, I still keep the low-level details in the back of my mind, and they inform my reasoning when working at higher levels.

> if the structure and the naming of the code provide me with these purposeful slogans for units of work, that's a massive win in terms of effort to comprehend the code

I feel the same way. I'm really grateful for good abstractions, clean structure and proper naming. But I naturally tend to not take them at face value. That is, I'll provisionally accept the code is what it says it is, but I feel much more comfortable when I can look under the hood and confirm it. This practice of spot-checking implementation saved me plenty of times from bad naming/bad abstractions, so I feel it's necessary.

Beyond that, I generally feel uncomfortable about code if I can't translate it to low-level in my head. That's the inverse of your first paragraph. When I look at high-level code, my brain naturally tries to "anchor it to reality" - translate it into something at the level of sequential C, step through it, and see if it makes sense. So for example, when I see:

  foo = reduce(map(bar, fn), fn2)
My mind reads it as both:

- "Convert items in 'bar' via 'fn' and then aggregate via 'fn2'", and

- "Loop over 'bar', applying 'fn' to each element, then make an accumulator, initialize it to first element of result, loop over results, setting the accumulator to 'fn2(accumulator, element)', and return that - or equivalent but more optimized version".

To be able to construct the second implementation, I need to know how 'map' and 'reduce' actually work, at least on the "sequential C pseudocode" level. If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code. Like floating above the cloud cover, not knowing where I am. I can still work like this, I just feel very insecure.

One particular example I remember: I was very uncomfortable with Prolog when I was learning it in university, until one day I read a chapter about implementing some of its core features in Lisp. When I saw how Prolog's magic works internally, it all immediately clicked, and I could suddenly reason about Prolog code quite comfortably, and express ideas at its level of abstraction.

One side benefit of having a simultaneous high and low-level view is, I have a good feel about the lower bound of performance of any code I write. Like in the map/reduce example above: I know how map and reduce are implemented, so I know that the base complexity will be at least O(n), how complexity of `fn` and `fn2` influence it, how the data access pattern will look like, how memory allocation will look like, etc.

Perhaps performance is where my way of looking at things comes from - I started programming because I wanted to write games, so I was performance-conscious from the start.

>If I don't know that, if I can't construct that interpretation, then I feel very uncomfortable about the code.

This is probably the biggest difference with myself. If I have a clear concept of how the abstractions operate in the context of the related abstractions and the big picture, I feel perfectly comfortable not knowing the details of how it gets done at a lower level. To me, the details just get in the way of comprehending the big picture.

A common problem with code written like that is checking the same preconditions repeatedly (or worse - never) and transforming data one way and back for no reason. I remember a bug I helped fix a fresh graduate that joined our project. It crashed with NPE when a list was empty. It's weird cause empty list should cause IndexOutOfBound if anything and the poor guy was stumped.

I looked at call stack and we got list as input then it was changed to null if it was empty then it was checked for size and in yet another function it was dereferenced and indexed.

Guy was trying to fix it by adding yet another if then else 5 levels in callstack below the first time it was checked for size. No doubt then another intern would have added even more checks ;)

If you don't know what happens to your data in your program you're doing voodoo programming.