Hacker News new | ask | show | jobs
by formerly_proven 61 days ago
Idiomatic/natural rust tends to be a lot heavier on allocations and also physically moving objects around than the other two.
1 comments

Can you elaborate on this? Slightly concerned because I have written (and planning to write more) Rust HPC code
Maybe not what they meant, but Rust sometimes makes it tempting to just copy things rather than fighting the borrow checker. Whereas in C++ you're free to just pass pointers around and not worry about it until / unless your code crashes or gets exploited.

Speaking authoritatively from my position as an incompetent C++ / Rust dev.

I see. Fortunately, I'm aware of that and I don't use clone (unless I intend to) as much. Borrow checker is usually not a problem when writing scientific/HPC code.

Because passing pointers isn't as ergonomic in Rust, I do things in arena-based way (for example setting up quadtrees or octrees). Is that part of the issue when it comes to memory bandwidth?

Stable Rust doesn't have a local allocator construct yet, you can only change the global allocator or use a separate crate to provide a local equivalent.
Right. I have seen Zig where one needs to specify allocators as well. I'm sorry I'm not well versed enough to know how it makes things better for HPC though?

For now my plan is to write fairly similar style code as one may write in C++/Fortran through MPI bindings in Rust.

if you're using thread level parallelism, there is always a benefit to having a per-thread allocator so that you don't have to take global locks to get memory, they become highly contended.

if you take that one step further and only use those objects on a single core, now your default model is lock-free non-shared objects. at large scale that becomes kind of mandatory. some large shared memory machines even forgo cache consistency because you really can't do it effectively at large scale anyways.

but all of this is highly platform dependent, and I wouldn't get too wrapped up around it to begin with. I would encourage you though to worry first about expressing your domain semantics, with the understanding that some refactoring for performance will likely be necessary.

if you have the patience and personally and within the project, it can be a lot of fun to really get in there and think about the necessary dependencies and how they can be expressed on the hardware. there's a lot of cool tricks, for example trading off redundant computation to reduce the frequency of communication.

Thank you for such a great reply!

There's a lot of useful advice here that'll surely come in handy to me later. For now, yeah I'm just going to try to make things work. So far I have mostly written intra-node code for which rayon has been adequate. I haven't gotten around to test the ergonomics of rs-mpi. But it feels quite an exciting prospect for sure.