Hacker News new | ask | show | jobs
by mynegation 2852 days ago
It takes hundreds, if not thousands person-years to write C++ compiler intentionally, damn next to impossible to write it accidentally as title implies.

Frank is a talented engineer, I used to follow his work on Calca very closely. But as others noted, this does not seem to be anywhere close to C++. The problem with parsing C++ starts with C. Say you see the code “T(t);” at the beginning of a function body. What is it? Declaration of variable t of type T? Call of function T on variable t? You cannot parse it properly without symbol table. No context free parser handles C properly, let alone C++. It gets progressively and exponentially worse from here.

2 comments

> You cannot parse it properly without symbol table. No context free parser handles C properly, let alone C++. It gets progressively and exponentially worse from here.

C is actually pretty easy if you ignore all of the bad advice to use parser generators. I looked into the various YACC grammars for C I could find on the Internet, and all of them either had bugs or were incomplete. TCC[1] has a simple recursive-descent parser. With a recursive descent parser you also have the option of implementing the C pre-processor in the same step. Turns out I was able to implement a single-pass C parser and pre-processor as a bunch of Common Lisp read macros[2].

I have not looked into it, but the approach for C++ looks like it would be very different because template instantiation needs its own step.

[1] https://bellard.org/tcc/ [2] https://github.com/vsedach/Vacietis/blob/master/compiler/rea...

Parsing is the easy part of a C++ compiler, and the T t example you mentioned has a relatively simple solution (which is, as you say, using the symbol table). In fact that ambiguity is present in C as well (with typedefs), and yet it hasn't really stopped many others from writing their own C compilers which handle that case.

It's the things like templates, multiple/virtual inheritance, and argument-dependent lookup/overload resolution which are difficult to get right, and the reason that even somewhat fully-featured C++ compilers are rare.