| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by j4k0bfr 106 days ago
	I'm pretty interested in realtime computing and didn't realise C++ was considered bandwidth efficient! Coming from C, I find myself avoiding most 'new' C++ features because I can't easily figure out how they allocate without grabbing a memory profiler.

5 comments

bayindirh 106 days ago

You can always go through cachegrind or perf and see what happens with your code.

I managed to reach practical IPC limits of the hardware I was running on, and while I could theoretically make prefetcher happier with some matrix reordering, looking back, I'm not sure how much performance it provided since the FPU was already saturated at that point.

GuB-42 106 days ago

C++ is like C with extra features, but you don't need to use them.

If you want control over your memory, you can do pointers the C way, but you still have features like templates, namespaces, etc... Another advantage of C++ is that it can go both high and low level within the same language.

Disadvantage of C++ is mostly related to portability and interop. Things like name mangling, constructors, etc... can be a problem. Also, C++ officially doesn't support some C features like "restrict". In practice, you often can use them, but it is nonstandard. Probably not a concern for HPC.

bch 106 days ago

> C++ is like C with extra features, but you don't need to use them

C++ certainly (literally (Cfront[0])) used to be this, but I thought modern (decade or more) conventional wisdom is to NOT think like this anymore. Curious to hear others weigh in.

[0] https://en.wikipedia.org/wiki/Cfront

GuB-42 106 days ago

To me, it is not "conventional wisdom", it is what a vocal group of C++ guys who look at Rust and its memory safety and don't want to be left out.

Their way is not wrong, new constructs are indeed safer, more powerful, etc... But if you are only in for the new stuff, why use C++ at all, you are probably better off with Rust or something more modern. The strength of C++ is that it can do everything, including C, there is no "right" way to use it. If you need raw pointers, use raw pointers, if you need the fancy constructs the STL provides, use them, these are all supported features of the language, don't let someone else who may be working in a completely different field tell you that you shouldn't use them.

ablob 106 days ago

C++ by comparison doesn't stand in your way too much either. I feel like the biggest gripe Rust has is what happens when you do have to go unsafe. That seems to be a strong point of contention for many folks. Maybe all the reasons that lead people to use unsafe rust go away or the attitude about it shifts in some manner.

For me Rust turned out to be less interesting after I saw the whole ceremony about typing. The amount of things I had to grasp just to get a glimpse into what a library does felt much more involved than any of the things I did with C++. The whole annotation-ting feels much less necessary and more like a proper opt-in there.

grg0 106 days ago

C++ comes with baggage and requires up-front training. You need to dive into every language feature and STL library, learn how compilers implement stuff, then decide what to use and what not to, and the decision often depends on context. It has a high cognitive load in my opinion for that reason. But once you do that, you get a relatively high-level language that can go as low and be as fast as C.

Narishma 106 days ago

I don't think there's much difference between C and C++ (and Rust, etc...) when it comes to this.

Joel_Mckay 106 days ago

There is unless using a llvm compiler that does naive things with code motion.

Rust is typically slowest (often negligible <3%), C++ has better CUDA support, and C can be heavily optimized with inline assembly (very unforgiving to juniors.)

Also, heavily associated with coding style =3

https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...

formerly_proven 106 days ago

Idiomatic/natural rust tends to be a lot heavier on allocations and also physically moving objects around than the other two.

kmaitreys 106 days ago

Can you elaborate on this? Slightly concerned because I have written (and planning to write more) Rust HPC code

Joeboy 106 days ago

Maybe not what they meant, but Rust sometimes makes it tempting to just copy things rather than fighting the borrow checker. Whereas in C++ you're free to just pass pointers around and not worry about it until / unless your code crashes or gets exploited.

Speaking authoritatively from my position as an incompetent C++ / Rust dev.

kmaitreys 106 days ago

I see. Fortunately, I'm aware of that and I don't use clone (unless I intend to) as much. Borrow checker is usually not a problem when writing scientific/HPC code.

Because passing pointers isn't as ergonomic in Rust, I do things in arena-based way (for example setting up quadtrees or octrees). Is that part of the issue when it comes to memory bandwidth?

zozbot234 106 days ago

Stable Rust doesn't have a local allocator construct yet, you can only change the global allocator or use a separate crate to provide a local equivalent.

kmaitreys 106 days ago

Right. I have seen Zig where one needs to specify allocators as well. I'm sorry I'm not well versed enough to know how it makes things better for HPC though?

For now my plan is to write fairly similar style code as one may write in C++/Fortran through MPI bindings in Rust.

convolvatron 106 days ago

if you're using thread level parallelism, there is always a benefit to having a per-thread allocator so that you don't have to take global locks to get memory, they become highly contended.

if you take that one step further and only use those objects on a single core, now your default model is lock-free non-shared objects. at large scale that becomes kind of mandatory. some large shared memory machines even forgo cache consistency because you really can't do it effectively at large scale anyways.

but all of this is highly platform dependent, and I wouldn't get too wrapped up around it to begin with. I would encourage you though to worry first about expressing your domain semantics, with the understanding that some refactoring for performance will likely be necessary.

if you have the patience and personally and within the project, it can be a lot of fun to really get in there and think about the necessary dependencies and how they can be expressed on the hardware. there's a lot of cool tricks, for example trading off redundant computation to reduce the frequency of communication.

Joel_Mckay 106 days ago

> realtime computing

Even with HDL defined accelerators, that statement may not mean what people assume. =3

https://en.wikipedia.org/wiki/Latency_(engineering)

https://en.wikipedia.org/wiki/Clock_domain_crossing

https://en.wikipedia.org/wiki/Metastability_(electronics)

https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...

https://www.youtube.com/watch?v=G2y8Sx4B2Sk

j4k0bfr 106 days ago

I'm talking almost exactly about this haha. Flight software representation!! Although I have no experience programming FPGAs, I hope to gain some soon. They seem like the ultimate solution to my IO woes.

Joel_Mckay 106 days ago

A bit legacy these days, but I liked the old Zynq 7020 dual core ARM with reasonable LUT counts.

https://www.youtube.com/watch?v=FujoiUMhRdQ

https://github.com/Spyros-2501/Z-turn-Board-V2-Diary

https://www.youtube.com/@TheDevelopmentChannel/playlists

https://myirtech.com/list.asp?id=708

The Debian Linux example includes how to interface hardware ports.

Almost always better to go with a simpler mcu if one can get away with it. Best of luck =3

j4k0bfr 106 days ago

No way, was looking at the Z-7045 for a work project literally today. And yep I agree, simpler solutions have simpler problems lol. Thanks for the recommendation, I'll give it a look!

jeffreygoesto 106 days ago

That chip was hitting a sweet spot in terms of DRAM controller and distributing memory bandwidth between CPU cores and fabric. Xilinx was very afraid of screwing this up and running into bottlenecks. One of the best balanced chips in that regard with a great controller. Your best bet still was to keep everything in blockram as much as possible and only read and write DRAM once at every end of the computation...