Hacker News new | ask | show | jobs
by G4E 1374 days ago
You can also use pycparser[0]. It is fully compatible C99, but be careful it doesn't support gnu extensions (like attributes, #indent, asm() ...). You can however work around most of them by -D defining them to empty macro in the argument.

[0] https://github.com/eliben/pycparser

1 comments

Right, pycparser is what CFFI uses. I’ve seen some really cryptic error messages when it tries to process some of my C header files (since worked around), and I’m curious what else is out there. The ability to preserve info about formatting that the OP noted is especially interesting.

As long as we’re on this tangent, here’s the challenge I’m facing with automated analysis of student code: from foo.c make bar.c which is identical to foo.c except that comments have been turned into spaces. I think this is annoyingly non-trivial.

To remove comments from source all you need is a tokenizer. You don't need all the tokens, just the "preprocessor tokens". For instance literal strings, ppnumbers ... Then /comments/ can be replaced with 1 space and //comments by \n
Multi-line comments would need to be turned into multiple blank lines, but yes, thank you for pointing out that I've been over-thinking this. I will look into what is the path of least resistance for this tokenizer-based transformation.