Hacker News new | ask | show | jobs
by wyldfire 3566 days ago
> Sometimes a gritty algorithm really does make the most sense if it's laid out as a 40 line function in one place rather than spread across 5 different classes so each part is independently swappable and testable.

I agree with you, but it feels a little like a false dichotomy. You can pretty easily decompose a long function into multiple smaller ones, within the same file. ~40 lines is close to the breaking point IMO. If it grows much beyond that, you can probably see groups of 10+ lines of code that have some sane subset of inputs/outputs. Toss those lines in another function, give it a name, make sure it's not exported outside of this file. If the 40+ line function gets reduced to a 10-20 line one with a few more stack frames, it is probably worth it.

3 comments

What size of codebase are you talking about?

If it's 100,000 lines of code, and you break stuff every 40 lines, you have now introduced 2500 procedures many of which don't really need to exist. But because they do exist, anyone who comes along now has to understand this complex but invisible webbing that ties the procedures together -- who calls who, when and under what conditions does this procedure make sense, etc.

It introduces a HUGE amount of extra complexity into the job of understanding the program.

(Also you'll find the program takes much longer to compile, link, etc, harming workflow).

I regularly have procedures that are many hundreds of lines, sometimes thousands of lines (The Witness has a procedure in it that is about 8000 lines). And I get really a lot done, relatively speaking. So I would encourage folks out there to question this 40-line idea.

See also what John Carmack has to say about this:

http://number-none.com/blow/blog/programming/2014/09/26/carm...

I realize you're in the gaming world, and that might be much different; but in the business application world (and I don't mean simple CRUD applications) anytime I've seen code like you describe, 1,000+ lines in a single class, let alone a single method, it's a complete mess.
When you factor it into a bunch of 40-line things, is it really less of a mess, or is it just that you can't see the mess any more -- maybe it looks clean, but if you pick up the rug, the room is filled with dust?

I think also what you're talking about is a function of programmer skill. I think if you have a good programmer write a 1000-line procedure, and a bad programmer write a 1000-line procedure, you are going to get drastically different things ... just like with anything.

If you were to take a poorly written 1000+ line function and split it up into 25 functions, you still have a complete mess AND you don't know where to find anything. If you look closer at those 1000+ line functions that are a dumpster fire you'll see the issue is probably more to do with tight coupling and/or hidden state changes to "global" variables than it is the length of the function itself.

A good example of a 1000+ line function I've written for business applications was for processing JSON for the initial state of a web app. You have a lot of data coming in, you need to do a lot of verification and transcribing backend data structures to frontend data structures. It's easier to do it all in one place than it is to break it into many little functions that only get called once anyway.

However, if you have a 1000+ line function that you split into small functions you can pretty easily write a few unit tests per function to see which, if any, of those chunks have problems and then need to be fixed. It's pretty much impossible to write unit tests that can sensibly test a non-trivial 1000+ line function. You might get away with it if it's doing something very straightforward but I wouldn't be very confident in it.
This is fine if you believe that unit testing every 40-line chunk of code is remotely worth the time and effort. I don't think that is true for most applications.

How long does it take you to write and test all those tests? Could you have been doing other things with that time? At 40 lines of functionality, the tests are going to be at least as big as the things you are testing (??), so what kind of a multiplier are you taking just on lines of code written? How much does that cost?

[I run a software company where I pay for the entire burn rate out of my own pocket. So these questions are less academic for me than they are for many people.]

It's true that there's a non-zero cost for each test, but overall I think tests speed up development rather than slow it down (unless you go crazy with the tests, and providing you're fairly decent at writing tests). I don't believe it's worth testing all paths through the code, but I'd aim for significantly over 50% coverage to have any degree of confidence in the codebase.

I estimate I write about 2:1 unit tests to code in terms of tests to functions but tests should be quite a bit faster to write than the code they're testing. I think I'm at the low end of how much I test my code compared to other engineers, however.

Perhaps it is different in game development. One of the big advantage of writing tests is that you can aggressively refactor with confidence; if you're planning to stop improving your codebase once the game is released maybe this isn't an issue? Plus bugs are perhaps less of an issue if you inconvenience the gamer rather than lose someone cash, and maybe you aren't expecting to hand code over to new developers.

So you have a massive switch/case like:

  switch(JSON.someparam) {
    case A: //code to translate A_server -> A_client
    case B: //code to translate B_server -> B_client
    ...
  }
Check all your cases have breaks? That you didn't accidentally introduce a new variable into your scope or clobber a variable already in scope?
It's usually the same in the gaming world too.
Carmack in that link isn't exactly giving unqualified support to your position. The first sentence links to where he's now a big fan of functional programming and supports programming using combinations of pure functions.
Obviously, he's a different person and has a different opinion.

My experience has been that people on HN tend to interpret that part of the posting a little more extrapolatingly than I do. I think he is saying something pretty obvious, which is that when you can structure things in terms of pure functions, you don't have to worry about the side-effects that are one of the main issues you need to contend with when factoring things apart.

This is different from being a "fan of functional programming", i.e. believing you should use current functional programming languages to build your projects, or whatever.

I'm curious about the 8000 line procedure and what made it the best approach in your case. Also, how do you navigate inside it?
It's the procedure that constructs most of the puzzle panels in the game.

Usually I just search for the name of the puzzle I want to edit (which is also how you'd do it if it were a ton of different procedures).

Hah, that's cheating! I'm not trying to defend an 'N-lines' rule which seems too obviously silly to even argue about but people do often break out chunks of code into procedures to give them logical, navigable names. You have names inside your big procedure.
I don't think it's a given that breaking it down introduces complexity. I've found that creating a sub-function for the sole purpose of just giving it a name does wonders for making it easier to understand the code. Names describe intent and are easier to remember than lines of code by themselves.
That mostly depends on where the algorithm is used. If it’s in a tight loop, those extra stack frames are deadly and completely not worth it.
40 lines is excessive, I prefer simple methods that take care of the if's and link them together serially. AFAICS this is the way the brain works.

  /**
   * get the leaf node as a string
   * @param obj the json object to operate on
   * @param path the dot path to use
   * @return the leaf node if present <b>and textual</b>, otherwise null
   */
  public static String leafString(ObjectNode obj, String path) {
    JsonNode leaf = leaf(obj, path);
    return leaf != null && leaf.isTextual()
        ? leaf.asText()
        : null;
  }