Hacker News new | ask | show | jobs
by flohofwoe 2020 days ago
TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.

Every single "fix" probably requires more lines of code under the hood than the entire preprocessor and in the end you have added tons of additional features to the language to fix problems that (often) don't need fixing.

The preprocessor being a simple text replacement tool is a feature, not a bug, but like every universal tool it requires some common sense to not abuse it.

6 comments

It would be fine if the preprocessor was actually replaced by something similar in concept, just with the glaring issues fixed like any other macro system devloped recently. But instead there's a bunch of ad-hoc rules trying to cover the things people use the preprocessor for.
The reason I don't agree with this is because the preprocessor is the main killer of compiler throughput in a large project, preventing a number of optimizations that would otherwise be possible.
This doesn't pass the sniff test regarding throughput. I've observed both cc and various linkers take hundreds to thousands of seconds on template-heavy and sometimes not-well-organized c++ on machines running around 4ghz with ddr4, nvme storage, and plenty of both to not be constrained (1TB RAM, 6TB disk). The preprocessor steps barely register in the bazel profile of my repo, compared to places where we hit slot paths in the compiler and linker due to massive mains which are fundamentally separate programs glued together with a switch/case and a read from a config file.
It should absolutely pass the sniff test.

The cost isn't the cost of parsing. It's the cost of compiling something you've compiled before. If you change a header that is included in N translation units, you compile N translation units, even if you didn't fundamentally change the header contents in a way that would effect the final object files.

But that isn't a cost of the preprocessor. It's the cost of definitions (and declarations in some cases) living somewhere other than the compilation unit in which they are used.

Even with a "module" system as handwaved in TFA, there's still the possibility that you (or someone else) changed a "module". C++ makes it almost impossible to decide if the change requires recompilation (without effectively doing the compilation to decide).

That's only because of #include, however. preprocessing is very fast, it's just because C++ lacks a sane way to import definitions that it dumps a huge amount of text into the frontend of the compiler.
This is the reason yes, and why macro encapsulation by modules is so important
How do modules work with generics and cross-module inline functions? Probably I can find the answer in D or Rust but I am not familiar with their mechanisms. Thanks.
In Rust, the library format also includes a pre-compiled version of the generics needed, and so when the compiler includes the library, it can monomorphize from there.
I'm not so sure about that. Surely it would take longer to parse both branches of an if-constexpr then it would for the preprocessor to see the #if and discard half of it.
I should clarify. My comment is a statement about the ability for the compiler to reliably cache compiled object files in an incrementally compiling situation.
Overuse of the preprocessor is perhaps the #1 cause of inscrutable compile error messages.

And certainly doesn't help with compile times.

The preprocessor is a big roadblock for C++ modules
Arguably that's a problem that C++ brought onto itself, because it "encourages" to put implementation details into headers (in the form of inline methods and template code). In C it's common to only put public interface declarations into headers which results in headers being much smaller and much faster to parse (that's why a module system is much more important for C++ than for C).
Could you elaborate on that? Which features of preprocessor make it impossible to implement C++ modules?
Template instantiation is a simple text replacement tool and it seems to work just fine.
I'm afraid it is not :) Template instantiation has way more rules than simple text replacement. Implicit instantiation is a story on its own, not counting that every compiler is free to implement own instantiation logic.

Someone said once that the truly portable thing between C++ compilers is only preprocessor :)

Because of compile-time evaluation and/or template pattern matching, C++ template instantiation is Turing complete. So it's basically as far away from "simple text replacement tool" as you could possibly be.
https://github.com/dlang/dmd/blob/master/src/dmd/dtemplate.d

That must be why the implementation of D's templates, which are designed to be easier to implement than C++'s is at least 8337 lines?

edit: Clang's clocks in at about 11k lines (.cpp alone), I'm too scared to find out for GCC.

GCC's cp/pt.c is around 30k lines. But I don't expect the implementation of templates to be this localized, sure, most of it is probably in that module, but a lot will be strewn about the code base, too.
Is that why templates have such user friendly error messages?
> TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.

This comment is short-sighted.

Take for example C++'s use of include guards to use translation units instead of modules to just compile a damn file. No one in their right mind would argue in favour of a preprocessor with #include instead of proper modules if they were to develop a new programming language.

Using #define to specify constant values is also absolutely awful.

Oh my. Once I spent two hours chasing a bug in my colleague's code (uni, not corp), until I arrived at this gem of his:

#define ZERO 1

IMHO #include for making declarations visible to other compilation units is the one big feature where ditching the preprocessors makes sense, at least in C++ where headers contain both declarations and implementation code (it's not as critical for C). Simply being able to include a file anywhere in the source is still useful (simply as a generic text-processing feature) so #include shouldn't be removed if sharing declarations if solved differently (for instance through a module system).

Same with #define, no harm in adding actual constants to the language since it's a simple, straightforward and expected feature. But that's no reason getting rid of #define because that's also useful as a catch-all text replacement which doesn't work on the language-level (and that's useful in many situations).