Hacker News new | ask | show | jobs
by bruce343434 63 days ago
What does it mean to be friendly to memory bandwidth, and why does C++ excel at it, over, say, Fortran or C or Rust?
3 comments

Actually, C, FORTRAN and C++ are friendly to memory bandwidth, written correctly.

C++ is better than FORTRAN, because while it's being still developed and quite fast doing other things that core FORTRAN is good at is hard. At the end of the day, it computes and works well with MPI. That's mostly all.

C++ is better than C, because it can accommodate C code inside and has much more convenience functions and libraries around and modern C++ can be written more concisely than C, with minimal or no added overhead.

Also, all three languages are studied so well that advanced programmers can look a piece of code and say that "I can fix that into the cache, that'll work, that's fine".

"More modern" programming languages really solve no urgent problems in HPC space and current code works quite well there.

Reported from another HPC datacenter somewhere in the universe.

I suppose that most HPC problems are embarrassingly parallel™, and have very little if any mutable shared state?
I'd say that the opposite is more often the reality, which is why HPC systems tend to have high-bandwidth, low-latency networks.
High bandwidth may mean the need to consult some very large but immutable data structure. As a trivial example, multiplying two matrices requires accessing each matrix fully multiple times over, but neither of them is altered in the process, so it can safely be done in parallel. Recording the result of a (naive) matrix multiplication can also be done without programmatic coordination, because each element is only updated once, independently from others.

This is very unlike, say, a database engine, where mutations occur all the time and may come from multiple threads.

Rust specifically makes it hard to impossible to clobber shared mutable state, e.g. to produce a dangling pointer. But this is not a problem that our matrix-multiplication example would have, so it won't benefit from being implemented in Rust. Maybe this applies to more classes of HPC problems.

The HPC infrastructure is not like you're used to using. It is very high bandwidth but latency is dependent on where your data lives. There's a lot more layers that complicate things and each layer has a very different I/O speed

https://extremecomputingtraining.anl.gov/sites/atpesc/files/...

Also how to handle the data can be very different. Just see how libraries like this work. They take advantage of those burst buffers and try to minimize what's being pulled from storage. Though there's a lot of memory management in the code people write to do all this complex stuff you need so that you aren't waiting around for disks... or worse... tape

https://adios-io.org/applications/

On the contrary. However, they tend to manually manage memory rather than outsourcing it to a language runtime or a distributed key-value store.
I'd say it's being able to structure your data however suits your problem and your hardware, then being able to look at a profile and being able to map read/writes back to source. Both C and C++ excel at this.

The advantage of C++ over C is that, with care, you can write zero-cost abstractions over whatever mess your data ends up as, and make the API still look intuitive. C isn't as good here.

According to my experience, all the "zero-cost abstractions" from C++ most of the time make it more annoying to maintain and/or understand the code, especially with respect to resource management, introduce compatibility issues at the toolchain level, and - even when looking perfect in toy benchmarks - are often not even zero-cost (e.g. all the bloat the templates generate often hurts).
Is Fortran 90 not flexible enough in defining data layout?
Parent talks about new languages, as per the article Fortran or C doing fine. I speculate the benefit of C++ over Rust how it let programmers instruct the compiler of warranty that goes beyong the initial semantic of the language. See __restrict, __builtin_prefetch and __builtin_assume_aligned. The programming language is a space for conversations between compiler builders and hardware designers.
It is just super unpleasant to write low level software in rust.

There is a colossal ergonomics difference if you compare using clang vs rust to writing a hashmap for example.

C compilers just have everything you can think of because everythin is first implemented there.

Using anything else just seems kind of pointless. I understand new languages do have benefits but I don't believe language matters that much really.

The person who writes that garbage pointer soup in C write Arc<> + multi threaded + macro garbage soup in Rust.

I believe __restrict, and __builtin_prefetch/__builtin_assume are compiler extensions, not part of the C++ language as is, and different compilers implement (or don't) these differently.

The rust compiler actually has similar things, but they're not available in stable builds. I suppose there are some issues if principle why not to include them in stable. E.g: https://doc.rust-lang.org/std/intrinsics/fn.prefetch_read_da...

Maybe some time in the future good acceptable abstractions will be conceived for them.. Perhaps using just using nightly builds for HPC is not that far out, though.

Rust already has __restrict; it is spelled &mut and is one of the most fundamental parts of the language. The key difference, of course, is that it's checked by the compiler, so is useful for correctness and not just performance. Also, for a long time it wasn't used for optimization, because the corresponding LLVM feature (noalias) was full of miscompilation bugs, because not that much attention was being paid to it, because hardly anyone actually uses restrict in C or __restrict in C++. But those days are finally over.

__builtin_assume is available on stable (though of course it's unsafe): https://doc.rust-lang.org/std/hint/fn.assert_unchecked.html

There's an open issue to stabilize the prefetch APIs: https://github.com/rust-lang/rust/issues/146941 As is usually the case when a minor standard-library feature remains unstable, the primary reason is that nobody has found the problem urgent enough to put in the required work to stabilize it. (There's an argument that this process is currently too inefficient, but that's a separate issue.) In the meantime, there are third-party libraries available that use inline assembly to offer this functionality, though this means they only support a couple of the most popular architectures.

btw. Fortran is implicitly behaving as "restrict" by default, which makes sense together with its intuitive "intent" system for function/subroutine arguments. This is one of the biggest reasons why it's still so popular in HPC - scientists can pretty much just write down their equations, follow a few simple rules (e.g. on storage order) and out comes fairly performant machine code. Doing the same (a 'naive' first implementation) in C or C++ usually leads to something severely degraded compared to the theoretical limits of a given algorithm on given hardware.
Oh I actually had some editing mistake, I meant to say that also Rust has restrict by default, by virtue of all references being unique xor readonly.

As I understand it, the Fortran compiler just expects your code to respect the "restrictness", it doesn't enforce it.

So that's where the intent system comes in (an argument can be in/out/inout) as well as the built-in array sizes, because it allows you to express what you want and then the compiler will enforce it. In Fortran you kinda have to work hard to invade the memory of one array from another, as they are allocated as distinct memory regions with their own space from the beginning. Pointer math is almost never necessary. Because there is built-in support for multidim arrays and array lengths, arrays are internally anyways built as flat memory regions, the same way you'd do it in C-arrays for good performance (i.e. cache locality), but with simple indices to address them. This then makes it unnecessary to treat memory as aliased by default.

Honestly, I still don't get why people have built up all these complex numerics frameworks in C and C++. Just use Fortran - it's built for exactly this usecase, and scientists will still be able to read your code without a CS degree. In fact, they'll probably be the ones writing it in the first place.

There are good reasons to use Fortran, some having to do with the language and many to do with legacy codes. These have to be balanced with the good reasons to avoid using Fortran for new development, which also have to do with the language and its compilers.
restrict is in C99. I’m not sure why standard C++ never adopted it, but I can guess: it can be hard to reason about two restrict’d pointers in C, and it probably becomes impossible when it interacts with other C++ features.

The rest are compiler extensions, but if you’re in the space you quickly learn that portability is valued far less than program optimization. Most of the point of your large calculations is the actual results themselves, not the code that got you there. The code needs to be correct and reproducible, but HPC folks (and grant funding agencies) don’t care if your Linux/amd64 program will run, unported, on Windows or on arm64. Or whether you’ve spent time making your kernels work with both rocm and cuda.