Hacker News new | ask | show | jobs
by jplasmeier 3248 days ago
"More readable code almost always means less efficient code."

Can you give examples? I'm not sold that the "low hanging fruit" of readability improvement (empty lines to break up "paragraphs", breaking up unintelligible one-liners into several lines, etc.) degrades performance. My intuition is that the compiler/interpreter ends up producing almost identical runtime code and therefore similar performance.

Unless you mean more along the lines of expressivity (i.e Python and Ruby can get a lot done with fewer statements compared to C), but to me this effect is in a different category than readability.

1 comments

Take the following code, which I had occasion to write this morning:

    foreach (var item in items) {
        if (items.Where(item2 => item.Upc == item2.Upc).Count() != 1) {
            item.Duplicate = true;
        }
    }
This is the most readable and straightforward code I could come up with for the task; it's also not particularly efficient, since it does at least twice as many comparisons as necessary (more if there are actually duplicate items). To make it more efficient would require extra loops and manually indexing them to skip over comparisons that had already made; in this context I decided against this because the performance impact was minimal since there are other far less efficient portions of the code which would best be addressed first if the program turns out to be too slow.
Ever heard of these things called "hashes" a.k.a. "dictionaries"? They are magical boxes for elements that are indexed by name. You build such a magical box of seen elements while going through the checking loop and you get the time complexity of O(n) or O(n log(n)) instead of your current O(n^2), so it won't explode when you get more elements passing through the code.

Note that this was conceived in just a few minutes of reading your code, deciphering syntax only remotely familiar, and figuring out what was it supposed to do, all with cold start (I haven't written much today yet). Something went wrong with your thought process.

First, a small correction: the if statement in my original code can actually be written as the following, which is what I had in the actual code:

    if (items.Count(item2 => item.Upc == item2.Upc) != 1)
With that correction out of the way, here's the code if I used a HashSet (you suggest 'a. k. a. "dictionaries"' but that is slower and I don't need the other half of the mapping which they would provide):

    var set = new HashSet<string>();
    foreach (var item in items) {
        if (set.Contains(item.Upc)) {
            item.Duplicate = true;
        } else {
            set.Add(item.Upc);
        }
    }
To my eye, this is less readable; in addition to having twice as many lines and an extra variable, it also takes extra work to figure out what it does; whereas my version does exactly what it says: if the UPC appears more than once, it marks the item as a duplicate.

For a final comparison, here's an implementation of the extra loops and manual indexing I suggested:

    for (var i = 0; i < items.Count; i++) {
        for (var j = i + 1; j < items.Count; j++) {
            if (items[i].Upc != items[j].Upc) continue;
            items[i].Duplicate = true;
            items[j].Duplicate = true;
            break;
        }
    }
This isn't significantly longer than the HashSet-based version, but it is even faster to run (I just benchmarked it) and also doesn't use an extra object which takes memory and needs to be managed and garbage-collected later.

In conclusion, while I appreciate your suggestion for a possible alternative implementation, I do not appreciate your impugning of my abilities with the suggestion that "something went wrong with your thought process", and implying that I may not be aware of basic data structures.

Dude, you're leaving O(n^2) landmines in your code and you're now justifying them by linecount and readability (actually more like familiarity with using sets and mappings). You have just proved my intuition about your abilities.

Not to menion that:

> you suggest 'a. k. a. "dictionaries"' but that is slower

In what way you think mappings are different in implementation than sets that makes them slower? They only need to carry one more pointer, and you're already writing in a language quite detached from bare metal.

One more thing, since you decided to "correct" yourself instead of letting the mistake slip: your "working" code is still wrong, unless something very weird happens in the data (but then calling it more readable this way is fundamentally wrong).

> Dude, you're leaving O(n^2) landmines in your code

O(n^2) I do not dispute. "Landmines," however, I take offense at: my naive implementation can still check 10,000 items in less than three seconds; should this ever become a bottleneck I assure you that I will adjust the implementation accordingly. Considering that hitting this number would require expanding the entire company at least tenfold, and knowing how other portions of the code work, I feel justified in saying that this is likely to be the least of their problems.

> you're now justifying them by linecount and readability (actually more like familiarity with using sets and mappings).

The entire point of this thread was justifying by readability, so yes, I am. I'm well familiar with sets and mappings, I just don't think that they're the best solution for this particular task.

> In what way you think mappings are different in implementation than sets that makes them slower?

I don't know the internal implementation details, but having just benchmarked it I can tell you that Dictionary<string, Item> is roughly 8% slower than HashSet<string> for two otherwise identical implementations of this method.

> You have just proved my intuition about your abilities.

Likewise. From the preceding conversation, I conclude that you are inclined to prematurely optimize at the cost of maintainability and that you tend to look down on and make fun of anyone who you believe knows less about a topic than you, without taking the time to consider their point of view, or even basic politeness. You may not think that this description accurately depicts you; but then I don't think your opinion of me seems to be correct either.

Considering that I have now spent far more time than this code will ever run for arguing the point, I'm going to step away from this discussion now. I feel that I have made my point; increasingly heated argument seems unlikely to affect the outcome of the debate.

> O(n^2) I do not dispute. "Landmines," however, I take offense at: my naive implementation can still check 10,000 items in less than three seconds

If you're satisfied that the loop you presented runs for a time comparable with three seconds for merely 10000 elements, it says plenty enough about your craft.

> [...] I can tell you that Dictionary<string, Item> is roughly 8% slower than HashSet<string> for two otherwise identical implementations of this method.

Ah yes, benchmark comparing non-implementation (abstract class) with an implementation (concrete class).

> From the preceding conversation, I conclude that you are inclined to prematurely optimize at the cost of maintainability

No. I just tend not to leave behind ridiculous algorithms, especially when an acceptably efficient solution is of comparable code complexity.

> [...] You may not think that this description accurately depicts you; but then I don't think your opinion of me seems to be correct either.

Parts of it, yes, they do match. I'm an asshole that laughs openly whenever sees statements that look ridiculous made by somebody who considers themselves an expert (note that it doesn't matter whether they are or are not actually an expert). But then, it's not my code that takes three seconds to walk through ten thousands elements when -- as napkin calculations show -- it should take under a millisecond.