Hacker News new | ask | show | jobs
by Rochus 60 days ago
I don’t like LLVM either, because its size and complexity are simply spiraling out of control, and especially because I consider the IR to be a total design failure. If I use LLVM at all, it would be version 4.0.1 or 3.4 at most. But it is the standard, especially if you want to run tests related to the question the fellow asked above. The alternative would be to build a frontend for GCC, but that is no less complex or time-consuming (and ultimately, you’re still dependent on binutils). However, C on LLVM or GCC should probably be considered the “upper bound” when it comes to how well a program can be optimized, and thus the benchmark for any performance measurement.
3 comments

> However, C on LLVM or GCC should probably be considered the “upper bound” when it comes to how well a program can be optimized, and thus the benchmark for any performance measurement.

Is it? Isn't it rather the case that C is too low level to express intent and (hence) offer room to optimize? I would expect that a language in which, e.g. matrix multiplication can be natively expressed, could be compiled to more efficient code for such.

I would rather expect, that for compilers which don't optimize well, C is the easiest to produce fairly efficient code for (well, perhaps BCPL would be even easier, but nobody wants to use that these days).

> I would expect that a language in which, e.g. matrix multiplication can be natively expressed, could be compiled to more efficient code for such.

That's exactly the question we would hope to answer with such an experiment. Given that your language received sufficient investments to implement an optimal LLVM adaptation (as C did), we would then expect your language to be significantly faster on a benchmark heavily depending on matrix multiplication. If not, this would mean that the optimizer can get away with any language and the specific language design features have little impact on performance (and we can use them without performance worries).

Rochus, your point about LLVM and the 'upper bound' of C optimization is a bit of a bitter pill for systems engineers. In my own work, I often hit that wall where I'm trying to express high-level data intent (like vector similarity semantics) but end up fighting the optimizer because it can't prove enough about memory aliasing or data alignment to stay efficient.

I agree with guenthert that higher-level intent should theoretically allow for better optimization, but as you said, without the decades of investment that went into the C backends, it's a David vs. Goliath situation.

The 'spiraling complexity' of LLVM you mentioned is exactly why some of us are looking back at leaner designs. For high-density data tasks (like the 5.2M documents in 240MB I'm handling), I'd almost prefer a language that gives me more predictable, transparent control over the machine than one that relies on a million-line optimizer to 'guess' what I'm trying to do. It feels like we are at a crossroads between 'massive compilers' and 'predictable languages' again.

When you call LLVM IR a design failure, do you mean its semantic model (e.g., memory/UB), or its role as a cross-language contract? Is there a specific IR propert that prevents clean mapping from Oberon?
Several historical design choices within the IR itself have created immense complexity, leading to unsound optimizations and severe compile-time bloat. It's not high-level enough so you e.g. don't have to care about ABI details, and it's not low-level enought to actually take care of those ABI details in a decent way. And it's a continuous moving target. You cannot implement something which then continus to work.
To be fair they also kind of share that opinion, hence why MLIR came to be, first only for AI, nowadays for everything, even C is going to get its own MLIR (ongoing effort).