Hacker News new | ask | show | jobs
by majewsky 3647 days ago
I really like the advice from "Perl Best Practices" to code in paragraphs. Sometimes, a large function cannot be broken up usefully, because a lot of state needs to be shared between the different parts, or because the parts don't have a meaning outside of the very specific algorithm.

In that case, code in paragraphs: Split the function body into multiple steps, put a blank line between these and, most importantly, add a comment at the start of the paragraph that summarizes its purpose.

Now when someone else finds your function, they can just gloss over the paragraph headings to get an idea of the function's overall structure, then drill down into the parts that are relevant for them.

6 comments

Situations like this are exactly where nested functions can be helpful.

I've always thought that it was a shame that C didn't have them.

Sometimes I almost wish that Algol flavored languages like Pascal anf Modula 2 would have won out for systems programming, instead of C and the languages it inspired.

Actually, GNU C supports nested functions, and a new round if standardization is just starting up, so maybe there's a chance?

In languages where braces define a scope even if there's no if, for, etc. keyword around, you can get a lightweight version of that just by sticking braces around your "paragraphs", as needed. They aren't the same as nested functions, in particular because you can't invoke a naked block multiple times, but if you've got a long function that hasn't got any useful break points in it, but you just want to chunk things, curly braces may help you isolate things nicely.

I tend to consider this a "last resort" over using actual functions, but there are certain classes of functions where this is helpful. The two biggest I know of are functions that represent a state machine, and functions that are the big top-level "plumb all the libraries together" where breaking that up into separate functions significantly complicates code due to all the intricate routing of values you have to do.

> They aren't the same as nested functions, in particular because you can't invoke a naked block multiple times,

Sure you can, just use `goto`! It's fantastic for code reuse. ;p

On a more serious note, gcc and clang both support block functions now. Pretty sure they aren't full closures but they can be handy in these situations.

The thing is, naked scope blocks are missing the main reason I'd use a nested function: the ability to be named.

I do occasionally use scope blocks when I want to constrain the scope of one or more local variables, and there isn't an otherwise appropriate scope already created by a flow control construct.

I also use naked scope blocks sometimes. To me, their main purpose is to make temporary variables go out of scope, to clarify which variables are still intentionally live beyond the block.
> Situations like this are exactly where nested functions can be helpful.

Indeed. When I'm writing Haskell, I'm using a lot of nested functions in `where` blocks, sometimes cascaded down.

Shared state between different parts of the code sounds an awful lot like something that could be a class.
Yes, a Method Object pattern http://c2.com/cgi/wiki?MethodObject
Neat! Now I have to refer to an entirely different file (or a different section of this one, which is almost as bad) in order to figure out what this one function is doing.
maybe both are related.

I never understood people that create a function only to set a flag or a set of flags in another and pass everything else.

Also sometime doing code that looks like this

> > if a: > getting_started(d,e) > if a and c: > maybe_prepare(f,g) > common(d,e,f,g) >

Instead of

> > getting_started(a,b,c,d,e) > maybe_prepare(a,b,c,d,e) > common(a,b,c,d,e) >

Usually I leave branching for the leaf code. It's maybe just me.

I'll second Sindisil's comment about nested functions.

I still often do the paragraph approach, but the thing about that is you rely on the comments to stay up to date to describe that paragraph.

I prefer to instead move that paragraph into a small nested function and name it to reflect what would be in that comment. I feel function names are easier to keep up to date than comments (ie, programmers don't just read past the function names like they do comments).

I completely disagree, every method can be split in private methods. In that way you don't need awful and unhelpful comments in the middle because you can understand what it does simply from the method name.
I've gone back and forth on this one over the years. My current advice would be that if you can find something that is naturally a sub-function, factor it out as one. Keep it private initially, but do not do this if that private function makes absolutely no sense on its own and your public code isn't calling it from more than one site.

If you factor things out into sub-functions that have no semantic meaning on their own then all you are doing is making the code harder to understand. You're also making it harder to maintain because of all the extra state that will need to be passed around between the sub-functions, which may change later.

For large complex functions that cannot be broken up sensibly, the paragraph splitting method is exactly what I use. Such large functions are a code smell and you should think carefully about whether it really does need to be so large, but there are indeed cases where it is the best option. Nobody should get too attached to dogmatic rules like "no function may exceed 25 lines".

The style that I prefer is slightly different to the GP though. I usually put about a paragraph of comments at the top explaining why the function is so long, giving an overview of the algorithm and other information that is inappropriate for a JavaDoc-style comment (since those are for the function consumer rather than a maintainer). I then give each "paragraph" of code a section heading and sometimes number these (especially if I have written out the algorithm in line-numbered pseudocode in the explanatory comment).

Totally agree. While I respect and try to adhere to the organization and naming philosophies of the "Clean Code" approach, I've also been bitten later when going over something that I hadn't worked on in months. I concluded that my brain has a state as well as the program, and comments are for reloading my brain state. Only when my brain is in the correct state do my names make complete sense. And still, unless I had ample time to make sure everything is perfectly coherent (how many times does that happen?) I'm still sometimes left wondering.

A comment that clearly explains "why," and sometimes even "how," is extremely helpful no matter how Hemmingway I think I am in the moment. And I also find chasing little factored-out bits of functions all over the place is tedious- like reading a paragraph of prose where a lot of the true meaning is delegated to footnotes. You're either skipping it, or constantly stopping and looking for the damn footnote. In code, sometimes that means grepping the project to find the damn thing. If it's small and used once, consider leaving it in.

Every method can be split sensibly. Using the functional paradigm it becomes natural to understand it because you usually work in the opposite way using composition of functions rather than dictating what happens in an imperative way. And large, complex functions do increase the cognitive load. I personally abhor regions or sections because most of the time they can go in a separate method and they just break the code-flow with something completely unrelated.
Methods can be split into private methods. But what that sometimes means is, when you're trying to understand how a piece of information flows through a method to fix a bug, instead of it being all self-contained in the method, instead you have to jump around to 10 places in a file to see where the information goes. That severely increases the cognitive overhead.
There is also a cognitive overhead when trying to find local variables defined in huge functions, at least when they are split into smaller private (static) methods (with variables passed as arguments) it's easier to track.

Both in the parent method and in the encapsulated child.

> you can understand what it does simply from the method name

That can be tough sometimes. How do you handle the case where you've created a function just to package some block of code that would otherwise be repeated 40 times? You end up with function names like

    add_to_list_when_cromulent() 
or even worse

    rebuild_stats_helper()
Or you have the situation where every time you do action A, it usually needs to be followed with action B. Because you want functions to do one thing only you have two functions action_A() and action_B(). But since you are always going to do them in pairs, you end up with a group action_A_and_B() that just calls the two functions sequentially.

I think I've settled on helper functions that are static or in anonymous namespaces (I work in C++) whenever possible.

The first case is a self-explicative one liner, it doesn't need an external function.

    var evenNumbers = Enumerable.Range(0, 40).Where(i => i%2 == 0).Select(i => i);
Private methods must group non-elementary steps. The second case is a perfect example that explains why it is so much better to use a private method because now you have a method that groups both sub-methods and you don't need to call always two different methods in a bunch of places. In this way you decreased code duplication increasing clarity.

    private void ProcessLogs()
    {
        var events = _logProvider.Read();
        events.Where(e => e.IsError).ForEach(SendErrorAlert);
        events.Where(e => e.IsWarning).ForEach(SendWarningAlert);
    }
And the method name seems quite self-explanatory to me.

If you need to separate one method in paragraphs, and add comments that explain what each paragraph does then that code is SCREAMING for a refactoring, deleting all the useless comments and extracting that mess in properly structured Classes/Methods.

I see this all the time in our PRs:

   UtilHelper.processItems(List<object> items) : List<object>
Here's John Carmack's take on the issue:

http://number-none.com/blow/john_carmack_on_inlined_code.htm...

TL;DR: He's in favor of inlining functions.

Actually, if you are inlining, he says: "you should be made constantly aware of the full horror of what you are doing."

I wouldn't say he's in favor of it, he actually appears to be advocating for a pure FP approach. But if you have to, inlining is OK, with the quoted caveat.

That's literally the first paragraph in the article, keep reading. Also, I think you misunderstood it.

> if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing.

The horror he refers to is not inlining, it's dealing with stateful logic. If you are doing state it's better to be aware of the horror making it explicit by inlining rather than hiding the state changes via passing and receiving in functions.

I'm pretty sure he's advocating for inlining one-off functions.

I agree and disagree with you :). Yes, I agree, he's advocating in-lining one-off functions, and I can see both points of view from that.

Having said that, how often do you have a large method where it isn't modifying a lot of state? I guess it depends on the context like a lot of things, but it seems like that's what is happening in the code I deal with (large methods changing state as well as having a lot of dependencies).

But the "article" isn't an article exactly, it's a commentary of an email that's 7 years old. The original email seems to completely advocate in-lining; the "article" part backtracks somewhat, and that's what my comment was around. I'm not always the clearest in my writing though.

sorry, but given this pre-condition (from gp):

> ... a large function cannot be broken up usefully, because a lot of state needs to be shared between the different parts ...

there is no way to break that up with multiple smaller functions without stowing away the state somewhere.

sometimes reading a large function is not half as bad as reading 10 different ones with each altering the shared state.

state can be passed through as arguments...
When coding in C, I often package the state that's shared between a high-level function and its local subroutines in a local "struct context" that's instantiated in the high-level function and then passed by address as the first argument to the subroutines. Makes it easy to see what the shared state is, and adding/changing the shared state doesn't require changing all the formal and actual argument lists.
Me too. OOP is actually having several such states. My go to pattern is to root states in an application tree.
Sure, but then there are more questions :) e.g. how many parameters ? 3, 4 ... ? what if they are of the same type ? would you change your numbers then ? users can get the order wrong etc. another thing : if you pass too many parameters, isn't that a hint to the fact that something is amiss ?

edit-1 : fixed typo

Remember this is in the case of spiting out smaller functions to improve readability.

Variable naming will only add to that, as you can rename variables in the extracted methods for their local purpose.

If you have too many arguments you can package them up in a object/map/tuple of your choosing (depending on language).

(Reverting to local state would in my opinion increase confusion and decrease readability.)

State can be stored in instance variables as well, and should be if many small functions share them, it's what objects are for.
> State can be stored in instance variables as well...

but then we are back to the beginning of the thread...

A easy way of making code into "paragraphs" with comments is to just move it into a function. So in the end, this large function you're talking about, is just calling the other ones, creating a paragraph while the functions are just "words". Makes it easy to test and no need for comments :)
This creates the risk of making what you are doing explicit at the cost of obscuring why you are doing something.

I frequently catch tests in code reviews that don't make it clear why something is being tested and what the overall expected result is beyond the effects being tested.

If the methods you are creating are private and for the purpose of segmenting a block of code, you really shouldn't be testing it directly, as it almost certainly won't be part of your public API.
This is one of my favorite things.