Hacker News new | ask | show | jobs
Is the preprocessor still needed in C++? (2017) (foonathan.net)
43 points by appehuli 2020 days ago
10 comments

Missing the most important case: Some external library you need changes a function signature and you need to be able to compile against the old or the new library, eg:

  #if LIBVERSION >= 2
    draw_point (2, 3, RED);
  #else
    set_color (RED);
    draw_point (2, 3);
  #endif
This is actually a case where the C preprocessor would be useful in many more languages. OCaml has cppo which is like a better cpp and is very useful for solving these sorts of problems. (https://github.com/ocaml-community/cppo)
You can use "if constexpr" for that.
In this case this doesn't work because "if constexpr" parses both branches. If the new library version changes function signatures, the if-branch for the old library version produces an error:

https://www.godbolt.org/z/4KK997

PS: interesting to note that Zig does the "right thing":

https://www.godbolt.org/z/9c13PY

There was a proposal to do the correct thing and copy D's static if, but it was rejected for fairly contrived reasons IIRC.

Andrei Alexandrescu mentions it in a talk, if constexpr doesn't really do much of anything useful because it introduces a scope.

Even though one or other branch of the if-statement won't be valid C++? How does it know that set_color is a function if it isn't defined anywhere?
TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.

Every single "fix" probably requires more lines of code under the hood than the entire preprocessor and in the end you have added tons of additional features to the language to fix problems that (often) don't need fixing.

The preprocessor being a simple text replacement tool is a feature, not a bug, but like every universal tool it requires some common sense to not abuse it.

It would be fine if the preprocessor was actually replaced by something similar in concept, just with the glaring issues fixed like any other macro system devloped recently. But instead there's a bunch of ad-hoc rules trying to cover the things people use the preprocessor for.
The reason I don't agree with this is because the preprocessor is the main killer of compiler throughput in a large project, preventing a number of optimizations that would otherwise be possible.
This doesn't pass the sniff test regarding throughput. I've observed both cc and various linkers take hundreds to thousands of seconds on template-heavy and sometimes not-well-organized c++ on machines running around 4ghz with ddr4, nvme storage, and plenty of both to not be constrained (1TB RAM, 6TB disk). The preprocessor steps barely register in the bazel profile of my repo, compared to places where we hit slot paths in the compiler and linker due to massive mains which are fundamentally separate programs glued together with a switch/case and a read from a config file.
It should absolutely pass the sniff test.

The cost isn't the cost of parsing. It's the cost of compiling something you've compiled before. If you change a header that is included in N translation units, you compile N translation units, even if you didn't fundamentally change the header contents in a way that would effect the final object files.

But that isn't a cost of the preprocessor. It's the cost of definitions (and declarations in some cases) living somewhere other than the compilation unit in which they are used.

Even with a "module" system as handwaved in TFA, there's still the possibility that you (or someone else) changed a "module". C++ makes it almost impossible to decide if the change requires recompilation (without effectively doing the compilation to decide).

That's only because of #include, however. preprocessing is very fast, it's just because C++ lacks a sane way to import definitions that it dumps a huge amount of text into the frontend of the compiler.
This is the reason yes, and why macro encapsulation by modules is so important
How do modules work with generics and cross-module inline functions? Probably I can find the answer in D or Rust but I am not familiar with their mechanisms. Thanks.
In Rust, the library format also includes a pre-compiled version of the generics needed, and so when the compiler includes the library, it can monomorphize from there.
I'm not so sure about that. Surely it would take longer to parse both branches of an if-constexpr then it would for the preprocessor to see the #if and discard half of it.
I should clarify. My comment is a statement about the ability for the compiler to reliably cache compiled object files in an incrementally compiling situation.
Overuse of the preprocessor is perhaps the #1 cause of inscrutable compile error messages.

And certainly doesn't help with compile times.

The preprocessor is a big roadblock for C++ modules
Arguably that's a problem that C++ brought onto itself, because it "encourages" to put implementation details into headers (in the form of inline methods and template code). In C it's common to only put public interface declarations into headers which results in headers being much smaller and much faster to parse (that's why a module system is much more important for C++ than for C).
Could you elaborate on that? Which features of preprocessor make it impossible to implement C++ modules?
Template instantiation is a simple text replacement tool and it seems to work just fine.
I'm afraid it is not :) Template instantiation has way more rules than simple text replacement. Implicit instantiation is a story on its own, not counting that every compiler is free to implement own instantiation logic.

Someone said once that the truly portable thing between C++ compilers is only preprocessor :)

Because of compile-time evaluation and/or template pattern matching, C++ template instantiation is Turing complete. So it's basically as far away from "simple text replacement tool" as you could possibly be.
https://github.com/dlang/dmd/blob/master/src/dmd/dtemplate.d

That must be why the implementation of D's templates, which are designed to be easier to implement than C++'s is at least 8337 lines?

edit: Clang's clocks in at about 11k lines (.cpp alone), I'm too scared to find out for GCC.

GCC's cp/pt.c is around 30k lines. But I don't expect the implementation of templates to be this localized, sure, most of it is probably in that module, but a lot will be strewn about the code base, too.
Is that why templates have such user friendly error messages?
> TBH the length that C++ goes to replace every single use of the preprocessor "just because" is close to zealotry.

This comment is short-sighted.

Take for example C++'s use of include guards to use translation units instead of modules to just compile a damn file. No one in their right mind would argue in favour of a preprocessor with #include instead of proper modules if they were to develop a new programming language.

Using #define to specify constant values is also absolutely awful.

Oh my. Once I spent two hours chasing a bug in my colleague's code (uni, not corp), until I arrived at this gem of his:

#define ZERO 1

IMHO #include for making declarations visible to other compilation units is the one big feature where ditching the preprocessors makes sense, at least in C++ where headers contain both declarations and implementation code (it's not as critical for C). Simply being able to include a file anywhere in the source is still useful (simply as a generic text-processing feature) so #include shouldn't be removed if sharing declarations if solved differently (for instance through a module system).

Same with #define, no harm in adding actual constants to the language since it's a simple, straightforward and expected feature. But that's no reason getting rid of #define because that's also useful as a catch-all text replacement which doesn't work on the language-level (and that's useful in many situations).

The best thing about C++ preprocessor is that it is dumb. Text goes in, text goes out. Easy to debug, simple rules. Anything I saw as an alternative either requires a significant amount of code, bending C++ rules, or specialized tools to see what is going on.

Java tried so hard to "do the right thing" by abolishing the preprocessor, and we ended up with another preprocessor called IDE, unnecessary code patterns, and (oh my) Maven profiles for conditional compilation (among other things).

On the other hand, the preprocessor is so dumb that having a normal variable or enum named "OK" or "STATUS" is a risk, even if all of your dependencies are clean. All it takes is a user to include your header and some header that #defines any name in your header to be something else.

So that means you really need to name your preprocessor symbols (and any other all-caps names, because that's the convention) in ways that probably won't collide. Like MYLIB_OK.

So it starts off dumb, but then you have to start layering on convention and defensive programming immediately. And it complicates entire other features of the language, naming constants and enumerated values especially.

D has no preprocessor and has none of those problems.
How can we replace nested comments? You can't comment out code that contains /* */ in it except with #if 0

Also, why is std::experimental::source_location loc = std::experimental::source_location::current(); loc.line better than __LINE__? what an unreadable monster that is!

Source location is so much more than that macro!!

- it has file name, line number, and char number! That already makes the number of characters more similar if that’s your metric

- it can be forwarded/passed around. It’s much harder to pass macros around

- it can easily capture the caller’s location rather than the location of the macro

Since #define will still exist, I'd suggest

   #define __WHERE__ std::source_location::current()
:)
Presumably it would become

  auto loc = std::source_location::current();
at some point, which seems fair enough to me.
I mean, don't love the macros - but that still looks like a load of typing to me - and it even reads less well than __LINE__ if you are looking at from a literate programming perspective.

It's tidier for the compiler though, but the change does not seem to make it easier for the reader of the code to comprehend.

It is since C++20. Comment author tries to make code verbose to make their point or does not know about auto and the state of C++.
I simply copied it from the article (minus the surrounding function).

Fortunately you only need to write this at the utility function, not at every call site, so it's ok.

In general I find various new features in C++ suffer from verbosity, but of course without experimental this one gets better.

auto in function signatures is just really, really stupid.
The idea is to get the location of the caller without using a macro.

D just uses __LINE__, because it hasn't got a preprocessor so the compiler can resolve the token properly.

The implementation of that is at https://github.com/dlang/dmd/blob/v2.094.2/src/dmd/expressio...

> How can we replace nested comments?

s/^/\/\// (and the reverse) work well for me. It nests.

> How can we replace nested comments?

you may use multiline string literals

Care to elaborate on how that's supposed work in practice?
Wow, ugly, but it works:

    (void) R"long-comment(
    /* C-style comment */
    std::cout << "hello" << std::endl;
    )long-comment";
Ugly, confusing, hard to undo, and only works within a function (since it's a statement).
Just yesterday I had to wrap up offsetof in a macro for use in some pseudo-reflection code

    #define MEMBER(C, M) { offsetof(C, M), sizeof(C::M) }
I couldn't figure out nice a way to do this without the preprocessor. The best I came up with was to use a lambda:

    [] (const C& c) { return std::cref(c.m); }
But these are stored in a std::map which means I have to use function pointers or accept the overhead of std::function
Rather than storing the offset within the struct, you could store a type-erased pointer to data member:

    struct M {
        std::byte M::*p;
        std::size_t l;
        template<class C, class T>
        M(T C::*e) : p{reinterpret_cast<std::byte M::*>(e)}, l{sizeof(T)} {}
    };
Also there are ways (not necessarily legal) to convert a pointer to data member to an offset; see the proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...
We just need an interpreter that can work on C++ files, supported by the standard.

Something akin to the https://www.python.org/dev/peps/pep-0638/

Code generators/transformers are rare only because it's so hard to actually start.

Too bad https://www.circle-lang.org never took off.

A few years ago Herb Sutter proposed Python-like metaclasses in C++ through compile-time code generation. Not sure if anything has been proposed for general use.

https://www.youtube.com/watch?v=4AfRAVcThyA

Unreal Engine 4 creates C++ reflection classes by parsing the headers and some of their custom macros that you define for member variables & function.

It basicaly allows you full C++ reflection, but only for classes that you mark with the UE4 macros.

Isn’t that what templates and constexpr are?
To a degree, but templates and constexpr don't support a bunch of features like compile-time field enumeration and introspection + code generation.

For example, let's say I have a bunch of structs:

  struct GeoCoordinate {
    int lat, long;
  };

  struct GeoArea {
    std::vector<GeoCoordinate> perimeter;
  };

  struct Place {
    std::string name;
    std::string contact_number;
    GeoArea area;
  };
Now I need to serialize these structs into a format to be sent over the wire. Currently, I have a few choices:

1. Use an off-the-shelf library like protobuf (disclaimer: I work for Google). Then I have to convert my code to a protobuf definition and rely on its code generator to perform [de]serialization. I also have to hope that my library supports all the field definitions I need.

2. Write macros to define each field in each structure. These macros perform some arcane magicks that somehow create the necessary [de]serialization functions. These macros are difficult to write and maintain (or I find a library).

3. Manually define the methods myself. This is tedious, hard to maintain, and error prone.

What if I could write some code in C++ which could read the structure and generate the appropriate serialization code? Something like (syntax hypothetical):

  Serializable(Class) {
    std::string serialize() {
      std::string output;
      for (auto member : Class.members()) { // loop unrolled at compile time
        if (member.type == int) {
          output.append(std::format("{:10}"), member.get())
        } else if (member.type == std::string) {
          ...
        } else if (member.type == std::vector) {
          ...
        } else if (std::has_metaclass_v<member.type, Serializable>) {
          output.append(member.get().serialize());
        }
      }
    };
  };

Then I could annotate my classes with Serializable instead.

See https://www.youtube.com/watch?v=4AfRAVcThyA.

Yeah static structural reflection is an important use-case. The main objection I’ve heard to standardization in C++ has to do with maintaining the mistake of conflating structs and classes and supporting C’s long broken unit type.
The preprocessor is still needed to implement a routine that allocates memory on the stack in a cross platform way.
TLDR of the article: the new C++ features can replace some macro usages, but not all.

"With current C++(17), most of the preprocessor use can’t be replaced easily."

"And even then: I think that proper macros, which are part of the compiler and very powerful tools for AST generation, are a useful thing to have. Something like Herb Sutter’s metaclasses, for example. However, I definitely don’t want the primitive text replacement of #define."

People often voice concerns over the type-safety or performance or flexibility of the preprocessor, arguing that since those all leave something to be desired, the preprocessor should be replaced. I’d like to make a few comments on those points.

First, and perhaps controversially, the preprocessor is type-safe; it just isn’t the same type system that C and C++ use. The syntactic elements that make up the preprocessor language like parentheses, commas, whitespace, hash signs and alphanumeric characters have their own unique types, and can only be used in contexts where those types are expected. You’ll receive an error if your preprocessor program tries to token-paste parentheses, or end function-like macro invocations with whitespace instead of parentheses, or skip commas in macro arguments when they’re expected. It’s important that people stop thinking of the preprocessor as “the thing that turns BIG_ALL_CAPS_CONSTANTS into C code”; the preprocessor it’s its own distinct language, and its purely by coincidence and some nudging by people involved in the early days of C 50 years ago that it happens to have its language interpreter run during the C compilation process.

As far as performance goes, the implementations used by the big three compilers are horrific in terms of memory usage (reaching tens of gigabytes in larger preprocessor programs, nothing ever gets freed) and processing speed (exponential algorithms galore). Clang’s preprocessor still isn’t fully standard-compliant even today. Heck, it took until 2020 for MSVC to get the /Zc:preprocessor flag to enable correct functionality. Twenty years after the last major addition! There’s a lot to be desired with the tools we use, even taking into account the complex macro expansion rules that some faster preprocessors (see: Warp) break to trade functionality for speed. It could be argued that any language that takes that long to get correct (let alone performant) implementations built is worth replacing to get rid of that complexity alone, but it’s worth keeping in mind that what we’re working with today could be much, much better than it is.

Lastly, the crappiness of the preprocessor as a general-purpose code generation language is greatly exaggerated, mostly because it isn’t Turing-complete. Yes, there’s no such thing as direct recursion with macros. But, there is such thing as indirect recursion, where each scan applied by the preprocessor can evaluate a macro again even if it was just evaluated. So, if you can set up a chain of macros that is capable of applying some huge number or scans (2^32, 2^64, whatever), even if that number is finite, it’s enough to do any conceivable code generation task. https://github.com/rofl0r/order-pp/blob/master/doc/notes.txt is the poster child of where that idea gets you; a functional programming language built on the preprocessor that can output any sequence of preprocessing tokens, with high-level language features like closures, lexical scoping, first-class functions, arbitrary precision arithmetic, eval, call/cc, etc.

The preprocessor is still the most powerful metaprogramming and language extension tool available in C++, since it’s the only tool we have to just.. generate code. No necessary reliance on compiler optimization to translate our recursive pattern-matching sfinae’d templates and constexpr functions into the code we expect. Just plain, simple text. I think that’s beautiful, and it’s not something that’s easy to replace.

tl;dr: Yes

Despite the author clearly disliking the preprocessor, for justified reasons, most of the article is about how essential it still is.