Hacker News new | ask | show | jobs
by thradams 1374 days ago
To remove comments from source all you need is a tokenizer. You don't need all the tokens, just the "preprocessor tokens". For instance literal strings, ppnumbers ... Then /comments/ can be replaced with 1 space and //comments by \n
1 comments

Multi-line comments would need to be turned into multiple blank lines, but yes, thank you for pointing out that I've been over-thinking this. I will look into what is the path of least resistance for this tokenizer-based transformation.