Hacker News new | ask | show | jobs
by kaslai 3045 days ago
The C preprocessor is the least of your worries with regard to compile times. Parsing a #if 0 can be done almost as quickly as a memcmp operation. Even my naive implementation can process an #if 0 at around 600 MBps. Even if you had a gigabyte of text in an #if 0, that's only about 2 more seconds on the compile, provided your disk can manage the throughput.
3 comments

> Even if you had a gigabyte of text in an #if 0, that's only about 2 more seconds on the compile

Are you sure your implementation is correct? How does it handle this:

    #if 0
    "\
    #endif \
    "
    #endif
I'm not saying gcc's and clang's preprocessors are not really fast, but preprocessing is trickier than most people expect. In particular (as you can see from my example), while skipping over an "#if 0" you still have to split the source into tokens and discard the tokens.
Absolutely, great example! I have looked at quite a few implementations of preprocessors, and they're not simple.

I don't trust anyone who thinks it's simple without actually having implemented it -- and tested it on real code.

One older thread: https://news.ycombinator.com/item?id=10945552

I agree with the sentiment, but it should be easy enough to smash line continuations together while you're searching for that #endif (famous last words)
Is your naive preprocessor implementation complete and standards compatible C99? Or just some bastardization of it? If not the 600 MBps bench is as good as useless.

What's the C preprocessor speed of GCC/Clang/MSVC? Any benches?

Some tests on my system: including everything at the root of /usr/include: (intel 4770HQ)

    $ for file in /usr/include/*.{h,hpp} ; do echo $file | awk '{ print "#include \"" $1 "\"" }' >> /tmp/test.cpp ; done
    $ (laborious step consisting in removing headers that don't play nice with others... in the end there are still more than 1425 base headers, which end up including most of Qt4 & Qt5, boost, etc)
    $ time g++ -E /tmp/test.cpp -fPIC -std=c++1z -I/usr/include/glib-2.0 -I/usr/include/qt -I/usr/include/glibmm-2.4 -I/usr/include/glib-2.0/glib/  -I/usr/lib/glib-2.0/include -I/usr/include/gdkmm-2.4 -I/usr/include/gdkmm-3.0 -I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/atk-1.0 -I/usr/include/qt4/ -I/usr/include/qt/QtCore -I/tmp -I/usr/include/qt/QtWidgets -I/usr/include/qt/QtGui -I/usr/include/KDE -D_FILE_OFFSET_BITS=64 -DPACKAGE=1 -DPACKAGE_VERSION=1 -I /usr/include/raptor2 -w -I/usr/include/rasqal -I/usr/include/lirc/include -Wno-fatal-errors -I/usr/include/lirc -I/usr/include/gegl-0.3 -I/usr/include/libavcodec -I/usr/include/freetype2 -DPCRE2_CODE_UNIT_WIDTH=8 -I/usr/include/python3.6m -I/usr/include/libusb-1.0 > out.txt

    1,82s user 0,18s system 99% cpu 2,013 total
out.txt (the whole preprocessed output) is 710kloc and 23 megabytes.

With clang:

    0,72s user 0,08s system 91% cpu 0,875 total 

so I'd say that in general, preprocessing time is fairly negligible. A few template instantiations will take much more time to compile.
Well, you have to multiply it by the number of translation units.

Also, I like the point made in the sibling comment by moefh. Are you sure you don't need to tokenize?

Can I see your implementation? I have looked at a few implementations of C preprocessors, and they're not simple.