Hacker News new | ask | show | jobs
by rcorcs 3653 days ago
I know there are hundreds of C compilers and that I am just reinventing the wheel. But my main goal is just learning. I have already written another compiler before, but it was just for a toy language. For this reason, I would like to build a compiler for a "real-life" language, and C is an important but yet small language. I want to master the whole process of a fully working compiler for a "real-life" language, and afterwards continue to build on top of this knowledge, since I have been doing research on automatic parallelisation, and I am interested in optimising compilers in general.

Even if this project doesn't turn out to be useful for a lot of people, I hope at least to inspire a few people to tackle big problems.

6 comments

I'm curious to know your reasoning behind using a generated parser instead of writing your own. This might be the first "just for fun" C compiler I've seen on HN or elsewhere that doesn't use some variant of recursive-descent/precedence, which is employed even by the "real" compilers like GCC, Clang, ICC, and MSVC.

If you are doing this for learning then I'd recommend studying C4's parser (https://news.ycombinator.com/item?id=8558822 ) and this article:

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

The precedence-climbing algorithm is so amazingly simple and concise that I think anyone playing with compilers should implement a parser using it at least once, just to experience its astounding elegance. IMHO seeing an entire programming language's AST parsed almost entirely using a single recursive function with very little code and a table concisely describing its grammar can be quite exhilarating.

> IMHO seeing an entire programming language's AST parsed almost entirely using a single recursive function with very little code and a table concisely describing its grammar can be quite exhilarating.

I recently implemented a toy interpreter for a small language which uses a hand-written, table-driven shift-reduce parser. The parser simply maintains a stack of terminals and non-terminals and applies the corresponding reduction if a suffix of the stack matches a rule in the table. Precedence rules are resolved through an additional "should_shift" function, keeping the grammar very understandable. I think it's a very elegant algorithm:

https://github.com/bbu/simple-interpreter

The code base looks nice and easy to navigate. I didn't see any license mentioned - I'd suggest stating one clearly for those considering to build something on top of it, or to contribute.
You can also look here for advices: https://www.reddit.com/r/Compilers/ There are people who is specialized on developing compilers.
Automatic parallelisation is a bad idea because the overhead caused by synchronisation is bigger than the gain. It's also a bad idea to take away control from the programmer. What if every compiler did automatic parallelisation differently? Then you'd have impossible to fix performance problems.
> Automatic parallelisation is a bad idea because the overhead caused by synchronisation is bigger than the gain.

A sufficiently intelligent compiler would be able to heuristically determine if the synchronization overhead for a program segment would be greater than the performance gain. Plus, there's lots of little cases where synchronization overhead is trivial, like map().

C isn't the best language for this, but there's plenty of languages (e.g. Haskell) that have data flows that can be thoroughly statically analyzed for dependencies.

> It's also a bad idea to take away control from the programmer.

Ideally you'd have compiler pragma to disable this on a case by case basis, and you'd still have more traditional parallel programming tools available.

> What if every compiler did automatic parallelisation differently? Then you'd have impossible to fix performance problems.

If you had a bad enough performance regression, you'd disable automatic parallelization and file a compiler bug (because it shouldn't regress below single-core performance). You could make this same argument against any optimizing compiler. They all apply different optimizations, and do sometimes lead to "impossible" to fix performance problems or even contradictory execution behaviors between implementations. But I still think `gcc -03` is invaluable.

I don't think automatic parallelization is a cure-all, or even that it should be enabled by default on compilers any time soon, but it's better to have it as an option than not.

This is awesome. Thank you for sharing it with us. :)
kudos to you, this is something I don't think I could ever accomplish but I know it's not easy. Keep learning and growing.