| HN Mirror

> would have expected that it is highly useful to have a language interface that allows run-time (or at least compile-time) reflection

That's an interesting thought. I'm not sure I agree, maybe?

The user job is to express how to partition the problem. To do that properly, they need to know, e.g., "how many bytes can I fit in the next partition per unit-of-compute", so the langue/run-time has to tell them that.

I don't know if knowing the costs of memory access at each level or across levels is useful. It is a reasonable assumption that, at the next level, memory accesses are at least one order of magnitude faster, and that synchronizing across the hierarchy is very expensive and should be minimized. That's a good assumption to write your programs with, and knowing actual costs do not really help you there.

> Another subtle but important question is: what primitives should a language provide to access a memory level, and which ones to move between levels. An obvious choice is to treat each level as an array

I think CUDA's __shared__ memory is quite good. E.g. per kernel, a way to obtain memory in the level of the hierarchy that the kernel is currently working on. Nobody has extended CUDA to multiple levels because there hasn't been a need, but I expect these programs to be "recursive" in the sense that they recursively partition a problem, and for __shared__ memory on each recursion level to give you memory at a different level of the hierarchy.

To move memory across levels of the hierarchy, you would just use raw pointers, with appropriate reads/writes (e.g. atomic ones). The exact instructions that get emitted would then depend on which level of the hierarchy you are. For that the compiler needs to know something about the architecture it is compiling for, but that seems unavoidable.

> [1] Z. DeVito, J. Hegarty, A. Aiken, P. Hanrahan, J. Vitek, Terra: A Multi-Stage Language for High-Performance Computing. https://cs.stanford.edu/~zdevito/pldi071-devito.pdf

Thanks, I actually hadn't read the Terra paper. I'll try to get to it this weekend. I think that using a Lua-like language as the meta-programming for your language (e.g. Regent) is an interesting approach, but it is not necessary.

For example, Scheme, Lisp, Haskell, D and Rust have shown that you can do meta-programming quite well in the same language you do normal programming in, without having to learn a completely different "meta-language".

I particularly like Rust procedural macros. It is normal run-time Rust code that just get first compiled, and then executed (with your source code as input) during compilation, with some twist to make that quite fast (e.g. the compiler parses an AST, and the proc macros do AST->AST folds).

If one wants to delay that until run-time, one should just do so (e.g. you just embed your compiler in your app, and when your proc macros run, its "run-time"). No need to layer multiple languages, but maybe the Terra paper convices me otherwise.