Hacker News new | ask | show | jobs
by WalterBright 524 days ago
> if you're working within a translation unit, thats much simplified, but then you're much more limited in what you can do without repeating a lot of code. I wonder how the author solves this.

You are correct in that the source code to the function being evaluated must be available to the compiler. This can be done with #include. I do it in D with importing the modules with the needed code.

> This is already somewhat possible if you can express your test as a macro, which if you add in the first point, then this becomes trivial.

Expressing the test as a macro doesn't work when you want to test the function. The example I gave was trivial to make it easy to understand. Actual use can be far more complex.

> Performance

D is faster at compiling than C compilers, mainly because:

1. the C preprocessor is a hopeless pig with its required multiple passes. I know, I implemented it from scratch multiple times. The C preprocessor was an excellent design choice when it was invented. Today it is a fossil. I'm still in awe of why C++ has never gotten around to deprecating it.

2. D uses import rather than #include. This is just way, way faster, as the .h files don't need to be compiled over and over and over and over and over ...

D's strategy is to separate the parse from the semantic analysis. I suppose it is a hair slower, but it also doesn't have to recompile the duplicate declarations and fold them into one.

Compile time function execution can be a bottleneck, sure, but that (of course) depends on how heavily it is used. I tend to use it with a light touch and the performance is fine. If you implement a compiler using it (as people have done!) it can be slow.

> one of the ways I have kind of implemented templating in C is by defining a variable and importing a c file, changing the variable, and then reimporting the same c file. Another thing I've done is define a bunch of things and then import the SQLite C Amalgamation and then add another function (I do this to expose a SQLite internal which isnt exposed via its headers). All of these use cases would break with this change.

I am not suggesting removing #include for C. The import thing would be additive.

> Are there any thoughts about these issues?

If you're using hacks to do templating in C, you've outgrown the language and need a more powerful one. D has top shelf metaprogramming - and as usual, other template languages are following in D's path.

3 comments

Thanks for taking the time to respond! I have a few followup questions if thats ok:

> You are correct in that the source code to the function being evaluated must be available to the compiler. This can be done with #include. I do it in D with importing the modules with the needed code.

> D's strategy is to separate the parse from the semantic analysis. I suppose it is a hair slower, but it also doesn't have to recompile the duplicate declarations and fold them into one.

I dont quite follow all the implications that these statements have. Does the compiler have a different way of handling a translation unit?

- Is a translation unit the same as in C, but since you're #including the file you would expect multiple compilations of a re-included C file? woudnt this bloat the resulting executable (/ bundle in case of a library)

- Are multiple translation units compiled at a time? Wouldnt this mean that the entire translation dependency graph would need to be simultaneously recompiled? Wouldnt this inhibit parallelization? How would it handle recompilation? What happens if a dependency is already compiled? Would it recompile it?

> Performance

I think a lot of this is tied to my question about compilation/translation units above, but from my past experience we have "header hygene" which forces us to use headers in a specific way, which if we do, we actually get really good preprocessor performance (a simple example being: dont use #include in a header), how would you compare performance in these kinds of situations vs a compiler without (i.e. either recompiled a full source file or looking up definitions from a compiled source)?

> If you're using hacks to do templating in C, you've outgrown the language and need a more powerful one. D has top shelf metaprogramming - and as usual, other template languages are following in D's path.

yes, as also demonstrated in the performance question, we do a lot to work within the confines of what we have when other tools would handle a lot more of the lifting for us and this is a fair criticism, but on the flip side, I dont have the power to make large decisions on an existing codebase like "lets switch languages" (even if for a source file or two...I've tried) as much as I wish I could, so I have to work with what I have.

> I dont have the power to make large decisions on an existing codebase like "lets switch languages"

We struggled with that for a long time with D. And finally found a solution. D can compile Standard C source files and make all the C declarations available to the D code. When I proposed it, there was a lot of skepticism that this could ever work. But when it was implemented and debugged, it's been a huge win for D.

> Performance

With D you can put all your source files on one command line invocation. That means that imports are only read once, no matter how many times it is imported. This works so well D users have generally abandoned the C approach of compiling each file individually and then linking them together. A vast amount of time is lost in C/C++ compilation with simply reading the .h files thousands of times.

Modules/imports are a gigantic productivity booster. They're not hard to implement, either. Except for the way C++ did it.

> re multiple translation units compiled at a time? Wouldnt this mean that the entire translation dependency graph would need to be simultaneously recompiled? Wouldnt this inhibit parallelization? How would it handle recompilation? What happens if a dependency is already compiled? Would it recompile it?

Yes, yes, yes, yes. And yet, it still compiles faster! See what I wrote above about not needing to read the .h files thousands of times. Oh, and building one large object file is faster than building a hundred and having to link them together.

I know that in other languages, one obstacle for "just compile the C files" is that the target language might not have pointers and thus have difficulty representing things such as return-by-pointer.

I suppose in D this was less of an issue because D has pointers?

I'm not sure what you mean.
A foreign function interface that's based on parsing C files must translate C types and interfaces into types and interfaces of the target language. I suppose it helped that D's type system has many similarities with C, including support for pointers.

(The issue with return-by-pointer is that in C it's common to use the return value for an error code and use pointer arguments to pass data back to the caller. These are awkward to map to a target language that doesn't have pointers)

> Is a translation unit the same as in C, but since you're #including the file you would expect multiple compilations of a re-included C file? woudnt this bloat the resulting executable (/ bundle in case of a library)

I think the idea is that compiling a translation unit produces two outputs, the object code (as it currently does), and an intermediate representation of the exported declarations, that could be basically a generated .h file, but it would probably be more efficient to use a different format. Then dependent translation units use those declaration files.

With this, you can still compile in parallel. You are constrained by the order of dependencies, but that is already kind of the case.

One complication is that ideally, if the signature doesn't change, but the implementation does, you don't need to re-compile dependent translation units. This is trivial if your build system detects changes based on content (like, say, bazel), but if it uses timestamps (like make) then the compiler needs to ensure the timestamp isn't updated when the declarations don't change.

But this really isn't a new concept. Basically every modern compiled language works fine without needing separate header files.

> This is trivial if your build system detects changes based on content (like, say, bazel), but if it uses timestamps (like make) then the compiler needs to ensure the timestamp isn't updated when the declarations don't change.

This is where the traditional distinction of "compiler vs Make" makes things harder; you want dependencies tracked at the "declaration" level, rather than the file level. If the timestamp _and_ content of the exported declarations file change, but none of the _used_ declarations changed, then there's no more compilation to be done. At best with file level tracking your build system will invoke the compiler for every downstream dependency, and they can decide if there's any more work to be done.

The build system would need to know which declarations are used (and what a declaration is) to do better.

The D compiler has an option to generate a "header file" from D modules. It's called a .di file. It's useful if you want to hide the implementation from a compiler, as you would with libraries.

As it turned out, though, people just found it too convenient to just import the .d file.

But as a very unexpected dividend, it was discovered that the D compiler would generate .di files from compiling .c files, and realized that D had an inherent ability to translate C code to D code!!!! This has become rather popular.

Nice explanation. Modules are the way forward. Looks to always have been. Not understanding the resistance, when the advantages are clear.
I do understand the resistance. C is a simple, comfortable language, and its adherents want it to stay that way, warts and all.

But in the context of that, what baffles me is the additions to the C Standard, such as useless (but complicated!) things like normalized Unicode identifiers, things with very marginal utility like generic functions, etc. Why those and not forward declarations?

Can't you use precompiled headers?
Interesting you brought that up. I implemented them for Symantec C and C++ back in the 90s.

I never want to do that again!

They are brittle and a maintenance nightmare. They did speed up compilations, though, but did not provide any semantic advantage.

With D I focused on fast compilation so much that precompiled headers didn't offer enough speedup to make them worth the agony.

>They are brittle and a maintenance nightmare

I happened to be reading DMC source this week, those hydrate/dehydrate stuff really is everywhere (which I assume is solely used for precompiled headers?)

Yup. I spent a crazy amount of time debugging that. The tiniest mistake was a big problem to find.
I had an intern try to use precompiled headers for the Linux kernel. The road block they found was that the command line parameters used to compile the header must exactly match for all translation units which it is used. This is no the case for the Linux kernel. We could compile the header multiple times, but the build complexity was not something we could overcome during the course of one internship.
> must exactly match

Yup. My compiler kept a list of which switches would perturb compilation and so would invalidate the precompiled header, and which did not.

Precompiled headers are an awful, desperate feature. Good riddance.