Missing the most important case: Some external library you need changes a function signature and you need to be able to compile against the old or the new library, eg:
This is actually a case where the C preprocessor would be useful in many more languages. OCaml has cppo which is like a better cpp and is very useful for solving these sorts of problems. (https://github.com/ocaml-community/cppo)
In this case this doesn't work because "if constexpr" parses both branches. If the new library version changes function signatures, the if-branch for the old library version produces an error:
TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.
Every single "fix" probably requires more lines of code under the hood than the entire preprocessor and in the end you have added tons of additional features to the language to fix problems that (often) don't need fixing.
The preprocessor being a simple text replacement tool is a feature, not a bug, but like every universal tool it requires some common sense to not abuse it.
It would be fine if the preprocessor was actually replaced by something similar in concept, just with the glaring issues fixed like any other macro system devloped recently. But instead there's a bunch of ad-hoc rules trying to cover the things people use the preprocessor for.
The reason I don't agree with this is because the preprocessor is the main killer of compiler throughput in a large project, preventing a number of optimizations that would otherwise be possible.
This doesn't pass the sniff test regarding throughput. I've observed both cc and various linkers take hundreds to thousands of seconds on template-heavy and sometimes not-well-organized c++ on machines running around 4ghz with ddr4, nvme storage, and plenty of both to not be constrained (1TB RAM, 6TB disk). The preprocessor steps barely register in the bazel profile of my repo, compared to places where we hit slot paths in the compiler and linker due to massive mains which are fundamentally separate programs glued together with a switch/case and a read from a config file.
The cost isn't the cost of parsing. It's the cost of compiling something you've compiled before. If you change a header that is included in N translation units, you compile N translation units, even if you didn't fundamentally change the header contents in a way that would effect the final object files.
But that isn't a cost of the preprocessor. It's the cost of definitions (and declarations in some cases) living somewhere other than the compilation unit in which they are used.
Even with a "module" system as handwaved in TFA, there's still the possibility that you (or someone else) changed a "module". C++ makes it almost impossible to decide if the change requires recompilation (without effectively doing the compilation to decide).
That's only because of #include, however. preprocessing is very fast, it's just because C++ lacks a sane way to import definitions that it dumps a huge amount of text into the frontend of the compiler.
How do modules work with generics and cross-module inline functions? Probably I can find the answer in D or Rust but I am not familiar with their mechanisms. Thanks.
In Rust, the library format also includes a pre-compiled version of the generics needed, and so when the compiler includes the library, it can monomorphize from there.
I'm not so sure about that. Surely it would take longer to parse both branches of an if-constexpr then it would for the preprocessor to see the #if and discard half of it.
I should clarify. My comment is a statement about the ability for the compiler to reliably cache compiled object files in an incrementally compiling situation.
Arguably that's a problem that C++ brought onto itself, because it "encourages" to put implementation details into headers (in the form of inline methods and template code). In C it's common to only put public interface declarations into headers which results in headers being much smaller and much faster to parse (that's why a module system is much more important for C++ than for C).
I'm afraid it is not :) Template instantiation has way more rules than simple text replacement. Implicit instantiation is a story on its own, not counting that every compiler is free to implement own instantiation logic.
Someone said once that the truly portable thing between C++ compilers is only preprocessor :)
Because of compile-time evaluation and/or template pattern matching, C++ template instantiation is Turing complete. So it's basically as far away from "simple text replacement tool" as you could possibly be.
GCC's cp/pt.c is around 30k lines. But I don't expect the implementation of templates to be this localized, sure, most of it is probably in that module, but a lot will be strewn about the code base, too.
> TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.
This comment is short-sighted.
Take for example C++'s use of include guards to use translation units instead of modules to just compile a damn file. No one in their right mind would argue in favour of a preprocessor with #include instead of proper modules if they were to develop a new programming language.
Using #define to specify constant values is also absolutely awful.
IMHO #include for making declarations visible to other compilation units is the one big feature where ditching the preprocessors makes sense, at least in C++ where headers contain both declarations and implementation code (it's not as critical for C). Simply being able to include a file anywhere in the source is still useful (simply as a generic text-processing feature) so #include shouldn't be removed if sharing declarations if solved differently (for instance through a module system).
Same with #define, no harm in adding actual constants to the language since it's a simple, straightforward and expected feature. But that's no reason getting rid of #define because that's also useful as a catch-all text replacement which doesn't work on the language-level (and that's useful in many situations).
The best thing about C++ preprocessor is that it is dumb. Text goes in, text goes out. Easy to debug, simple rules. Anything I saw as an alternative either requires a significant amount of code, bending C++ rules, or specialized tools to see what is going on.
Java tried so hard to "do the right thing" by abolishing the preprocessor, and we ended up with another preprocessor called IDE, unnecessary code patterns, and (oh my) Maven profiles for conditional compilation (among other things).
On the other hand, the preprocessor is so dumb that having a normal variable or enum named "OK" or "STATUS" is a risk, even if all of your dependencies are clean. All it takes is a user to include your header and some header that #defines any name in your header to be something else.
So that means you really need to name your preprocessor symbols (and any other all-caps names, because that's the convention) in ways that probably won't collide. Like MYLIB_OK.
So it starts off dumb, but then you have to start layering on convention and defensive programming immediately. And it complicates entire other features of the language, naming constants and enumerated values especially.
How can we replace nested comments? You can't comment out code that contains /* */ in it except with #if 0
Also, why is std::experimental::source_location loc = std::experimental::source_location::current(); loc.line better than __LINE__? what an unreadable monster that is!
I mean, don't love the macros - but that still looks like a load of typing to me - and it even reads less well than __LINE__ if you are looking at from a literate programming perspective.
It's tidier for the compiler though, but the change does not seem to make it easier for the reader of the code to comprehend.
A few years ago Herb Sutter proposed Python-like metaclasses in C++ through compile-time code generation. Not sure if anything has been proposed for general use.
To a degree, but templates and constexpr don't support a bunch of features like compile-time field enumeration and introspection + code generation.
For example, let's say I have a bunch of structs:
struct GeoCoordinate {
int lat, long;
};
struct GeoArea {
std::vector<GeoCoordinate> perimeter;
};
struct Place {
std::string name;
std::string contact_number;
GeoArea area;
};
Now I need to serialize these structs into a format to be sent over the wire. Currently, I have a few choices:
1. Use an off-the-shelf library like protobuf (disclaimer: I work for Google). Then I have to convert my code to a protobuf definition and rely on its code generator to perform [de]serialization. I also have to hope that my library supports all the field definitions I need.
2. Write macros to define each field in each structure. These macros perform some arcane magicks that somehow create the necessary [de]serialization functions. These macros are difficult to write and maintain (or I find a library).
3. Manually define the methods myself. This is tedious, hard to maintain, and error prone.
What if I could write some code in C++ which could read the structure and generate the appropriate serialization code? Something like (syntax hypothetical):
Serializable(Class) {
std::string serialize() {
std::string output;
for (auto member : Class.members()) { // loop unrolled at compile time
if (member.type == int) {
output.append(std::format("{:10}"), member.get())
} else if (member.type == std::string) {
...
} else if (member.type == std::vector) {
...
} else if (std::has_metaclass_v<member.type, Serializable>) {
output.append(member.get().serialize());
}
}
};
};
Then I could annotate my classes with Serializable instead.
Yeah static structural reflection is an important use-case. The main objection I’ve heard to standardization in C++ has to do with maintaining the mistake of conflating structs and classes and supporting C’s long broken unit type.
TLDR of the article: the new C++ features can replace some macro usages, but not all.
"With current C++(17), most of the preprocessor use can’t be replaced easily."
"And even then: I think that proper macros, which are part of the compiler and very powerful tools for AST generation, are a useful thing to have. Something like Herb Sutter’s metaclasses, for example. However, I definitely don’t want the primitive text replacement of #define."
People often voice concerns over the type-safety or performance or flexibility of the preprocessor, arguing that since those all leave something to be desired, the preprocessor should be replaced. I’d like to make a few comments on those points.
First, and perhaps controversially, the preprocessor is type-safe; it just isn’t the same type system that C and C++ use. The syntactic elements that make up the preprocessor language like parentheses, commas, whitespace, hash signs and alphanumeric characters have their own unique types, and can only be used in contexts where those types are expected. You’ll receive an error if your preprocessor program tries to token-paste parentheses, or end function-like macro invocations with whitespace instead of parentheses, or skip commas in macro arguments when they’re expected. It’s important that people stop thinking of the preprocessor as “the thing that turns BIG_ALL_CAPS_CONSTANTS into C code”; the preprocessor it’s its own distinct language, and its purely by coincidence and some nudging by people involved in the early days of C 50 years ago that it happens to have its language interpreter run during the C compilation process.
As far as performance goes, the implementations used by the big three compilers are horrific in terms of memory usage (reaching tens of gigabytes in larger preprocessor programs, nothing ever gets freed) and processing speed (exponential algorithms galore). Clang’s preprocessor still isn’t fully standard-compliant even today. Heck, it took until 2020 for MSVC to get the /Zc:preprocessor flag to enable correct functionality. Twenty years after the last major addition! There’s a lot to be desired with the tools we use, even taking into account the complex macro expansion rules that some faster preprocessors (see: Warp) break to trade functionality for speed. It could be argued that any language that takes that long to get correct (let alone performant) implementations built is worth replacing to get rid of that complexity alone, but it’s worth keeping in mind that what we’re working with today could be much, much better than it is.
Lastly, the crappiness of the preprocessor as a general-purpose code generation language is greatly exaggerated, mostly because it isn’t Turing-complete. Yes, there’s no such thing as direct recursion with macros. But, there is such thing as indirect recursion, where each scan applied by the preprocessor can evaluate a macro again even if it was just evaluated. So, if you can set up a chain of macros that is capable of applying some huge number or scans (2^32, 2^64, whatever), even if that number is finite, it’s enough to do any conceivable code generation task. https://github.com/rofl0r/order-pp/blob/master/doc/notes.txt is the poster child of where that idea gets you; a functional programming language built on the preprocessor that can output any sequence of preprocessing tokens, with high-level language features like closures, lexical scoping, first-class functions, arbitrary precision arithmetic, eval, call/cc, etc.
The preprocessor is still the most powerful metaprogramming and language extension tool available in C++, since it’s the only tool we have to just.. generate code. No necessary reliance on compiler optimization to translate our recursive pattern-matching sfinae’d templates and constexpr functions into the code we expect. Just plain, simple text. I think that’s beautiful, and it’s not something that’s easy to replace.