Hacker News new | ask | show | jobs
by doctor_phil 788 days ago
But a switch and an if-else *is* a matter of algorithmic complexity. (Well, at least could be for a naive compiler). A switch could be converted to a constant time jump, but the if-else would be trying each case linearly.
6 comments

But what if, and stick with me here, a compiler is capable of reading and processing your code and through simple scalar evolution of the conditionals and phi-reduction, it can't tell the difference between a switch statement and a sequence of if statements by the time it finishes its single static analysis phase?

It turns out the algorithmic complexity of a switch statement and the equivalent series of if-statements is identical. The bijective mapping between them is close to the identity function. Does a naive compiler exist that doesn't emit the same instructions for both, at least outside of toy hobby project compilers written by amateurs with no experience?

The issue with if statements (for compiled languages) is not one of "speed" but of correctness.

If statements are unbounded, unconstrained logic constructs, whereas switch statements are type-checkable. The concern about missing break statements here is irrelevant, where your linter/compiler can warn about missing switch cases they can easily warn about non-terminated (non-explicitly marked as fall-through) cases.

For non-compiled languages (so branch prediction is not possible because the code is not even loaded), switch statements also provide a speed-up, i.e. the parser can immediately evaluate the branch to execute vs being forced to evaluate intermediate steps (and the conditions to each if statement can produce side-effects e.g. if(checkAndDo()) { ... } else if (checkAndDoB()) { ... } else if (checkAndDoC()) { ... }

Which, of course, is a potential use of if statements that switches cannot use (although side-effects are usually bad, if you listened to your CS profs)... And again a sort of "static analysis" guarantee that switches can provide that if statements cannot.

While I personally find the if statements harder to immediately mentally parse/grok--as I have to prove to myself that they are all using the same variable and are all chained correctly in a way that is visually obvious for the switch statement--I don't find "but what if we use a naive compiler" at all a useful argument to make as, well, we aren't using a naive compiler, and, if we were, there are a ton of other things we are going to be sad about the performance of leading us down a path of re-implementing a number of other optimizations. The goal of the compiler is to shift computational complexity from runtime to compile time, and figuring out whether the switch table or the comparisons are the right approach seems like a legitimate use case (which maybe we have to sometimes disable, but probably only very rarely).
Per my sibling comment, I think the argument is not about speed, but simplicity.

Awkward switch syntax aside, the switch is simpler to reason about. Fundamentally we should strive to keep our code simple to understand and verify, not worry about compiler optimizations (on the first pass).

Right, and there I would say we even agree, per my first sentence; however, I wanted to reply not to you, but to doctor_phil, who was explicitly disagreeing about speed.
Yup.

That said, the linear test is often faster due to CPU caches, which is why JITs will often convert switches to if/elses.

IMO, switch is clearer in general and potentially faster (at very least the same speed) so it should be preferred when dealing with 3+ if/elseif statements.

Any sufficiently advanced compiler will rewrite those arbitrarily depending on its heuristics. What authors usually forget is that there is defined behavior and specification which the compiler abides by, but it is otherwise free to produce any codegen that preserves the defined program order. Branch reordering, generating jump tables, optimizing away or coalescing checks into branchless forms are all very common. When someone says "oh I write C because it lets you tell CPU how exactly to execute the code" is simply a sign that a person never actually looked at disassembly and has little to no idea how the tool they use works.
A complier will definitely try this, but it's important to note that if/else blocks tell the compiler that "you will run these evaluations in order". Now, if the compiler can detect that the evaluations have no side effects (which, in this simple example with just integer checks, is fairly likely) then yeah I can see a jump table getting shoved in as an optimization.

However, the moment you add a side effect or something more complicated like a method call, it becomes really hard for the complier to know if that sort of optimization is safe to do.

The benefit of the switch statement is that it's already well positioned for the compiler to optimize as it does not have the "you must run these evaluations in order" requirement. It forces you to write code that is fairly compiler friendly.

All that said, probably a waste of time debating :D. Ideally you have profiled your code and the profiler has told you "this is the slow block" before you get to the point of worrying about how to make it faster.

I agree with what you said but in this particular case, it actually was a direct integer equality check, there was zero risk of hitting side effects and that was plainly obvious to me, the checker, and compiler.
And to your original comment, I think the reviewer was wrong to reject the PR over that. Performance has to be measured before you can use it to reject (or create...) a PR. If someone hasn't done that then unless it's something obvious like "You are making a ton of tiny heap allocations in a tight loop" then I think nitpicking these sorts of things is just wrong.
Hard disagree that it's "clearer". I have had to deal with a ton of bugs with people trying to be clever with the `break` logic, or forgetting to put `break` in there at all.

if statements are dumber, and maybe arguably uglier, but I feel like they're also more clear, and people don't try and be clever with them.

Updates to languages (don't know where C# is on this) have different types of switch statements that eliminate the `break` problem.

For example, with java there's enhanced switch that looks like this

    var val = switch(foo) {
     case 1, 2, 3 -> bar;
     case 4 -> baz;
     default -> {
       yield bat();
     }
    }
The C style switch break stuff is definitely a language mistake.
C# has switch statements which are C/C++ style switches and switch expressions which are like Rust's match except no control flow statements inside:

    var len = slice switch
    {
        null => 0,
        "Hello" or "World" => 1,
        ['@', ..var tags] => tags.Length,
        ['{', ..var body, '}'] => body.Length,
        _ => slice.Length,
    };
(it supports a lot more patterns but that wouldn't fit)
C# has both switch expressions like this and also break statements are not optional in traditional switch statements so it actually solves both problems. You can't get too clever with switch statements in C#.

However most languages have pretty permissive switch statements just like C.

Yeah, fair, it's been awhile since I've done any C#, so my memory is a bit hazy with the details. I've been burned C with switch statements so I have a pretty strong distaste for them.
I think using C as your language with which to judge language constructs is hardly fair - one of its main strengths has been as a fairly stable, unchanging code-to-compiler contract, i.e. little to none syntax change or improvements.

So no offense, but I would revisit the wider world of language constructs before claiming that switch statements are "all bad". There are plenty of bad languages or languages with poor implementations of syntax, that do not make the fundamental language construct bad.

This is just forcing return value. You either have to break or return at the branches. To me they all look equivalent
I always set -Werror=implicit-fallthrough, among others. That prevents fallthrough unless explicitly annotated. Sadly these will forever remain optional warnings requiring specific compiler flags, since requiring them could break compiling broken legacy code.
Both the switch and the if have O(1) instructions, so both are the same from an algorithmic complexity perspective.
It's linear with respect to the number of cases, not the size of inputs. It's still O(1) in the sense of algorithmic complexity.
Unless the number of "else if" statements somehow grows e.g. linearly with the size of your input, which isn't plausible, the "else if" statements also execute in O(1) time.