|
|
|
|
|
by 0x07c0
3746 days ago
|
|
I wish that to be true, but that is not what I'm seeing. (I'm doing HPC.) . It's not about native c performance vs some other language. Its about the the low level stuff you can do in C. You use avx (and the compiler don't help(they are supposed to, but don't do it very well), you have to use intrinsics or asm), then memory stuff, cash blocking, alignment, non temporal stuff. Same for CUDA, compiler don't get that much performance. You have to think about all low level stuff, usually memory, like alignment, use shared memory or not, cash line size etc.. . And then you are using multiple GPUs.. No help from compiler, you have to do all by your self. Had been nice with compiler doing it, and there are some compiler that helps. But you don't get max performance, and with some effort the performance you get by handcoding all this stuff is much greater then what compilers can give you. And that advantage is increasing. |
|
I would argue that the low level work you are doing should be done in a macro or compiler.
http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl1.pd...
http://www.graphics.stanford.edu/~hanrahan/talks/dsl/dsl2.pd...
Pat Hanrahan makes a compelling argument for using special purpose DSLs to construct efficient performant code that takes advantage of heterogeneous hardware.
See the Design of Terra, http://terralang.org/snapl-devito.pdf