Hacker News new | ask | show | jobs
by wgerard 1340 days ago
My personal addition: Code isolation

I've "prematurely" split up code across multiple/classes functions so many times before these huge god functions became set-in-stone and unwieldy.

I had a co-worker once who was extremely heavy on YAGNI--and I don't mean that as a pejorative, it was a really helpful check against code solving non-existent problems. They once called me out on a refactor I'd done surreptitiously for a payments flow to split payment states into multiple classes. Serendipity had struck, and I was able to point to a submitted (but not accepted) diff that would've broken the entire payments flow (as opposed to just breaking one state of the flow).

I always think about that every time I question whether I'm isolating code prematurely.

3 comments

My rule for this is that:

* If breaking up the code makes it easier to understand with its current functionality then absolutely go ahead.

* If breaking up the code makes it slightly less easy to understand right now, but will make it easier to add some feature later that you know you know you're definitely going to need, then hold off.

Most of the time you "know" you're going to add a feature, you turn out to be wrong, and (worse than that) some other slightly different feature is needed instead that's actually harder to add because you broke up the logic along the wrong axis. Whereas, breaking code up purely based on what makes it easier to understand now, ironically, usually does a much better job at making it easy to add a feature later - it often almost feels like an accident (but it's not, really).

I bet your payment flow refactoring made things a bit easier to understand, even if it had never been worked on again.

> If breaking up the code makes it slightly less easy to understand right now, but will make it easier to add some feature later that you know you know you're definitely going to need, then hold off.

This. Your code shouldn't be documenting what it will do in the future: it should be clear what it's doing right now. Otherwise some other engineer maybe from another team will come upon it, misread it, and bugs ensue. Trying to save yourself time in the future ends up wasting time for others, and likely for yourself as well.

This relates to "Rule of 3 Before Abstracting": One concrete instance of a proposed abstraction is YAGNI (adding an abstraction blurs the concrete code and makes it harder to understand), two concrete instances are coincidence (it's maybe a sign that you need an abstraction between the two, but you don't have enough scientific data to consider), at least three concrete instances is (finally) a pattern (abstracting a known pattern is right and good and because it becomes a document of the high level pattern itself and the differences and distinctions between the concrete implementations become more distinct and obvious).

That "Rule of 3" is sometimes a good reminder itself not to build for some possible future of the codebase, but to build for the patterns in your code base as they exist today.

I hadn't heard of it but it's a good rule! There are too many examples of code that was abstracted out to be shared between uses that end up evolving independently, and the abstraction ends up with a bunch of if-statements and confusion.
Premature class isolation: Usually bad.

Some notable exceptions are things like addresses where many times users have multiple, but usually people splitting up the user table and the "user profile" table are just causing headaches. A wide user table is fine.

Premature method or function isolation: Usually good.

Even if there isn't reuse, the code is usually more readable.

    def let_in_bar?
        return self.is_of_age_in_region? and self.is_sober?
Is a perfectly fine isolation with very little downside.
I know your example is just a toy, but it definitely reminds me of heavy handed over-refactoring into tiny one-line functions.

There are two problems with that approach:

The first is when you're trying to understand what a single function does. If it's 20 lines long but has self-contained logic, that's often easier to understand than referring back to the tiny pieces it's assembled from (even if they're supposedly so self contained you don't need to look at them individually - that's rarely true in practice). On balance, even if a function is a slightly too long, that's usually less bad than one that's in slightly too many little pieces.

The second is when you're looking at the file level. When faced with a file with dozens of tiny functions, it's much harder to get a top level understanding than if it has a smaller number of longer functions.

The second one is the bigger problem, because understanding a project at the file (and higher) level is typically much harder than understanding what a single function does. Even if breaking up a function into smaller pieces does actually make it easier to understand, but makes the overall project harder to parse, then that's usually a loss overall. Of course, it depends very much on the specifics of the situation though, and certainly a long function broken down into logical pieces can actually make the overall project easier to understand.

> over-refactoring into tiny one-line functions

It's relatively rare to refactor a single line of code into its own function, unless it's really hard to read, e.g. some nasty regex.

> even if they're supposedly so self contained you don't need to look at them individually - that's rarely true in practice

If a long function is broken up into smaller functions only for the sake of having smaller functions, then you end up with functions that don't really make sense on their own and, sure, the bigger function is better. But if the broken up functions are named well enough like in the example above, then it shouldn't be necessary to see how it's implemented, unless you're tracking down a bug. After all, it's very rare to look at e.g. how a library function is implemented, and for some libraries/languages it's not even possible.

> When faced with a file with dozens of tiny functions, it's much harder to get a top level understanding than if it has a smaller number of longer functions.

Most languages have facilities to help with that, e.g. function visibility. Your smaller number of large functions can remain the public API for your module while the "dozens" of tiny functions can be private. In either case, long files are harder to take in no matter how many functions they're composed of.

> But if the broken up functions are named well enough like in the example above, then it shouldn't be necessary to see how it's implemented, unless you're tracking down a bug.

(Emphasis added.) Yeah, exactly. Tracking down a bug is one of the times that code clarity is most important, and over eager abstraction is most annoying.

To be fair, what you're championing for PAGNI is the poster child for YAGNI, so you're basically against YAGNI.