Hacker News new | ask | show | jobs
by dsharlet 291 days ago
The problem I've seen is this: in order to get good performance, no matter what language you use, you need to understand the hardware and how to use the instructions you want to use. It's not enough to know that you want to use tensor cores or whatever, you also need to understand the myriad low level requirements they have.

Most people that know this kind of thing don't get much value out of using a high level language to do it, and it's a huge risk because if the language fails to generate something that you want, you're stuck until a compiler team fixes and ships a patch which could take weeks or months. Even extremely fast bug fixes are still extremely slow on the timescales people want to work on.

I've spent a lot of my career trying to make high level languages for performance work well, and I've basically decided that the sweet spot for me is C++ templates: I can get the compiler to generate a lot of good code concisely, and when it fails the escape hatch of just writing some architecture specific intrinsics is right there whenever it is needed.

1 comments

The counterpoint to this is that having a language that has a graceful slide between python like flexibility and hand optimized assembly is really useful. The thing I like most about Julia is it is very easy to both write fast somewhat sloppy code (e.g. for exploring new algorithms), but then you can go through and tune it easily for maximal performance and get as fast as anything out there.
> easily for maximal performance and get as fast as anything out there.

Optimizing Julia is much harder than optimizing Fortran or C.

for equal LOC, sure. for equal semantics, less true