Hacker News new | ask | show | jobs
by uecker 17 days ago
The correct choice IMHO is type-erasure. It does not necessarily have overhead, because optimizers can specialize or devirtualize. Of course, this my depend on how you implement your language, but in C this works nicely. The problem with monomorphization is that it leads to exponential expansion, which later passes of the compiler can not unify again (at least this is much harder than not expanding in the first place). It should also fundamentally limit what you can do, because expansion has to stop at some point, but I haven't thought about this too much.

I also think that where you want monomophization, macro seem fine. I do not think this necessarily has to be clunky, but this is just a guess.

2 comments

Type-erasure does have an inherent overhead. Sure, optimizations can be made, but they can be fickle and specialization is basically implicit monomorphization.

Using C macros to replicate Rust's monomorphism has several drawbacks: they are inherently unhygienic, even in comparison to Rust's own; you can't set type-bounds; they aren't even a part of C proper, etc.

I prefer Rust's approach with the choice between generics, macros, dyn and Any.

What inherent overhead does type-erasure have? The optimizer can always specialize just like for monomorphization. The difference is that it does not have to do this. Monomorphization specializes everything before the optimizer even has a chance to look the the code. So it fundamentally can not have advantages, only restrictions.

C macros are certainly a proper part of C and one can also certainly add type-bounds. But yes, they are not ideal. Still, if one wanted to do this, one could certainly improve them a lot for type-generic programming. I would prefer this to having macros, generics, a const expression sublanguage, and vtables.

The inherent overhead of type-erasure is the vtable. Which is funny, because you said you prefer one over the other at the end, when they are basically synonymous. The compiler may choose to devirtualize and specialize, but that isn't guaranteed. If you want this behavior, &dyn does the exact same thing.

Specialization provides more opportunities for optimizations, whereas proper type-erasure (which bars specialization optimizations) doesn't due to lack of type information.

"C macros" are a part of the preprocessor, which runs before the actual C compiler. As such, it lacks all semantical information the C compiler would have at that point, such as function implementions. In practice, macros in C serve two purposes: manipulating C source code (which Rust macros can also do, but with more hygiene); specialization polymorphism, but worse (in which both Rust's generics and C++ templates do better).

The vtable can be removed in all cases where you specialize by monomorphization, so the overhead compared to it is not really inherent.

Type-erasure can be undone by specialization. In contrast, monomorphization specializes the code first, at which point it becomes much more difficult to unify it again.

Sorry, my comment about preference was badly phrased: What I meant is that I prefer to make macros better, than having all the different but partially overlapping techniques in the same language. This is the complexity I am complaining about in both C++ and Rust.

As I stated in my follow up, vtables can only be removed when the type information is still available, which, generally, aligns with specialization, except when dealing with libraries (even with LTO). You say the "vtable can be removed in all cases of monorphization", but that isn't guaranteed and often requires double guessing the compiler to ensure it still knows the type, where as monorphization makes it explicit when the compiler doesn't know the type anymore and that specialization must occur.

The problem with your suggestion about macros is that it misunderstands the reason why different techniques exist. While they may overlap, they have different primary niches: Rust macros provide codegen and inline DSL, Rust generics/C++ templates provide monomorphism, C++ constexpr/Rust const fn evaluate expressions in compile time, etc.

Extending preprocessor macros, aside from requiring integration into the C language proper, would likely create something much more complex and with poor DX. Specially so if it tries to fulfill all niches, as each of them often operate at different abstraction levels, with different ergonomic requirements that are often at odds with each other.

I just noticed you said "The compiler can always specialize". That's indisputably false. That only happens if the compiler has enough information to infer the concrete type at the call point. Conditionals, collections, or anything that may throw doubt about original type info and their function implementations can (and normally will) disable specialization.
It can always specialize where you could do monomorphization.
Not necessarily. In pure theory, yes. But in pure theory, monomorphized code can be deduplicated too.

Monomorphization often forces types information to flow, whereas type-erasure considers that a bonus. As such, a compiler may accidentally introduce opaque boundaries that hide type information. Even in cases where this is the programmer's fault and the type info is truly lost, in monomorphisation, that becomes an explicit error, while, for devirtualization, since it's just an optimization, the call is silently revirtualized (until someone notices it while profiling/decompiling). One has explicit intent, the other doesn't.

I don't think the numbers bear out that "this works nicely" in C, it seems like you have worse perf numbers for some common cases like sorting ?
Not sure what numbers you are talking about. If you use qsort from the C library the comparison function will not be inlined, but if you provide your own, this is no problem.
> if you provide your own, this is no problem.

If just "providing my own" would help why wouldn't the stdlib benefit too? You're going to have to spell out what you think can actually work here if you want me to believe there's "no problem".

It would also, but nobody cares enough because qsort is already fast enough for most things, and if you cared it is simply enough to do yourself. Are you doubting that C compilers can devirtualize function calls? Here is a small example that illustrates this. The compiler dervirtualizes all calls than folds the result: https://godbolt.org/z/E6cMMr8vx