| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by foundry27 2020 days ago

People often voice concerns over the type-safety or performance or flexibility of the preprocessor, arguing that since those all leave something to be desired, the preprocessor should be replaced. I’d like to make a few comments on those points.

First, and perhaps controversially, the preprocessor is type-safe; it just isn’t the same type system that C and C++ use. The syntactic elements that make up the preprocessor language like parentheses, commas, whitespace, hash signs and alphanumeric characters have their own unique types, and can only be used in contexts where those types are expected. You’ll receive an error if your preprocessor program tries to token-paste parentheses, or end function-like macro invocations with whitespace instead of parentheses, or skip commas in macro arguments when they’re expected. It’s important that people stop thinking of the preprocessor as “the thing that turns BIG_ALL_CAPS_CONSTANTS into C code”; the preprocessor it’s its own distinct language, and its purely by coincidence and some nudging by people involved in the early days of C 50 years ago that it happens to have its language interpreter run during the C compilation process.

As far as performance goes, the implementations used by the big three compilers are horrific in terms of memory usage (reaching tens of gigabytes in larger preprocessor programs, nothing ever gets freed) and processing speed (exponential algorithms galore). Clang’s preprocessor still isn’t fully standard-compliant even today. Heck, it took until 2020 for MSVC to get the /Zc:preprocessor flag to enable correct functionality. Twenty years after the last major addition! There’s a lot to be desired with the tools we use, even taking into account the complex macro expansion rules that some faster preprocessors (see: Warp) break to trade functionality for speed. It could be argued that any language that takes that long to get correct (let alone performant) implementations built is worth replacing to get rid of that complexity alone, but it’s worth keeping in mind that what we’re working with today could be much, much better than it is.

Lastly, the crappiness of the preprocessor as a general-purpose code generation language is greatly exaggerated, mostly because it isn’t Turing-complete. Yes, there’s no such thing as direct recursion with macros. But, there is such thing as indirect recursion, where each scan applied by the preprocessor can evaluate a macro again even if it was just evaluated. So, if you can set up a chain of macros that is capable of applying some huge number or scans (2^32, 2^64, whatever), even if that number is finite, it’s enough to do any conceivable code generation task. https://github.com/rofl0r/order-pp/blob/master/doc/notes.txt is the poster child of where that idea gets you; a functional programming language built on the preprocessor that can output any sequence of preprocessing tokens, with high-level language features like closures, lexical scoping, first-class functions, arbitrary precision arithmetic, eval, call/cc, etc.

The preprocessor is still the most powerful metaprogramming and language extension tool available in C++, since it’s the only tool we have to just.. generate code. No necessary reliance on compiler optimization to translate our recursive pattern-matching sfinae’d templates and constexpr functions into the code we expect. Just plain, simple text. I think that’s beautiful, and it’s not something that’s easy to replace.