Hacker News new | ask | show | jobs
by Rapzid 1638 days ago
I watched a conference presentation by Scylla DB and a lot of the reasons given for their perf boost using C++ over Cassandra's Java seem like C# might address now in 2021. Span<T> in particular is a perf game changer for this kind of stuff.

Would be interesting if C# now would be a viable alternative to C++ for them.

3 comments

Hey. ScyllaDB employee here. There are several reasons C++ was used and I don't think span ultimately matters. List from top of my head (ordered randomly):

1) we use intrusive containers, so memory managing container data structures is collocated with actual data. 2) memory allocation is not tied to GC, so we don't get pauses 3) there's almost none synchronization between different threads and there are (almost) no globals. For a story why globals are a killer for performance, read https://www.p99conf.io/2021/09/28/hunting-a-numa-performance... 4) the previous is only possible with existence of a user space scheduler which guarantees that specific threads are pinned to a single CPU. Also, there's no need to call mmap multiple times, as Seastar (concurrency framework written with Scylla in mind) allocates whole system memory and takes advantage of overcommitting in Linux. There's no syscall at memory allocation, just some userspace work and a possible page fault.

I'm not sure whether C# can do away with these problems? Let me know if you know. That being said, modern C++ is really convenient. Not anything you've seen 15 years ago in university.

I can ensure you that what you would see in a university in 2022 is going to be just like 15 years ago.

As for C# you can do most C++ like stuff in C# 10.

As a matter of fact, I'm still at university and mine actually showed a fair bit of modern C++ (University of Warsaw here). So I don't feel ensured. As for C#, I don't claim I know stuff. I'd just like to learn something new. If you finish at an assertion like yours, sadly I don't learn anything new.
As proven by occasional threads on /r/cpp and including complaints from Bjarne himself on some of his talks, that is unfortunately not yet a common practice.

Regarding C#, if you really want to learn how to do C++ style programming in C#, have a look at the documentation regarding C# 7.0 - 7.3, C# 8, C# 9 and C# 10 regarding readonly structs, span, stackalloc in safe code, blittable types, GC free regions, malloc/free calls, allocation free memory pipelines, in and return ref types, local references, using pattern (implementing IDispose is no longer required)

Regarding classical C# (what is available until .NET Framework 4.8), you have structs, value types, manual memory management via System.Runtime.InteropServices.

You can start at the free posters here, https://prodotnetmemory.com/

> actually showed a fair bit of modern C++ (University of Warsaw here)

You mean like C++11? So C++ standard from 11 years ago ? or C++14? C++17? Last time I checked UW was like 17 years ago so maybe things changed but back then they were like 10+ years behind industry in practical terms.

In 2019 it was in C++17. Now it's in C++20.
Don't know how .Net, but many times I found that Java HotSpot compiler is weaker in optimisation strength than C, C++ and Rust compilers.

For instance see this: https://pkolaczk.github.io/overhead-of-optional/

As for databases, C++ has not only a performance edge over Java (and possibly .NET) but it also offers superior non-memory resource management capabilities. Databases manage a lot of resources that are not memory, and RAII is a game changer.

Your linked post is not really a good example for that — escape analysis is very finicky without language-level semantic guarantees the compiler could use. With the proposed Valhalla changes Optional will be a value-class and these optimizations become trivial.

Especially when you return a value, it is more than likely to escape.

Optional is only a part of the picture here. It also missed:

* branch elimination with cmov

* loop unrolling

* SIMD vectorization

* turning heap allocation into stack allocation

All those things could be done without breaking any semantic guarantees of Optional even without value types in place.

Also note how even forcing the Rust program to use references with double Box didn't make the code any worse. So Rust/LLVM had no issue optimizing that out even if Option was defined the way it is in Java now.

A lot of the problem stems from Java’s boxing, because the first n values are cached and so defeat escape analysis can’t remove the boxing reliably, and that cannot be fixed without breaking some applications.
Java is capable of all of these optimizations though — but I am not an OpenJDK dev so I’m getting out of my depth here.

Of course you have less time/resources during JIT compilation (and mostly, inline depth), so the quality of the resulting code can at times be vastly worse than what an AOT compiler can do, but my experience is that in real life code bases Java’s JIT compiler is really great, while this benchmark reflects on a singular case where it failed.

> Java is capable of all of these optimizations though

In theory - yes.

In my experience it just repeatedly does worse job than a C / C++ / Rust compiler, unless I'm very careful in Java coding (yes, I can often make it close, but this requires way non-idiomatic Java code; e.g. I've seen cases when manually unrolling a loop helped getting 2x more performance, which is something I don't recall ever having to do in C / C++ / Rust).

For example we don't use Java Streams in performance critical code, because everybody on the team knows it does not optimize them back to the level of simple for loops. Well, we checked many times and it simply never happened, although, theoretically it could. But I can throw a chain of map/filter/fold calls in C++ or Rust freely and it just works as fast as a hand-optimized loop, with unrolling, simd, etc.

How did you measure it? Because unless it is a long-running production code or JMH, it can be tricky to correctly measure it.

(But I’m fairly sure you know that already)

There's no reason to program in C# if you already know C++.
Other than memory safety, simplicity, dependency management and build tooling, the .NET standard libraries, the open source library ecosystem, and so on...
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
> Other than memory safety, simplicity, dependency management and build tooling, the .NET standard libraries, the open source library ecosystem

Not feeding the troll but outside memory safety, everything else you list exist in the C++ ecosystem with generally better alternative than in C#.

And as soon as you do touch mmap , unsafe area or native code in C# you loose memory safety too anyway.

> Not feeding the troll but outside memory safety, everything else you list exist in the C++ ecosystem with generally better alternative than in C#.

Oh? I don't think you can dispute that the C standard library is very limited and the state of dependency management / build tooling is very poor. And that actually limits the usability of the open source library ecosystem quite a lot; maybe there are more C++ libraries out there, but you can't just type what you want into the nuget search bar and get on with using it.

Simplicity is in the eye of the beholder, but the very weak semantics of C++ templates mean you can't reason compositionally about C++ code, whereas in C# it's relatively easy to have a codebase that you can reliably understand piecemeal.

> And as soon as you do touch mmap , unsafe area or native code in C# you loose memory safety too anyway.

In principle yes, but if you keep those points very rare then you can subject them to extra review etc. at a level that would be impractical with a C++ codebase (where even "a + b" is undefined behaviour in the general case). Memory safety vulnerabilities in real-world C# codebases are rare.

>simplicity

I didn't spent a lot of hours in C++ world, but it never felt simple

- N compilers, N package managers, N ways to do everything

> I didn't spent a lot of hours in C++ world, but it never felt simple

C++ is not simple.

But presenting C# (or Java) as "simple" is equally hypocritical. The JVM or the CLR and their associated frameworks are monster of complexity, engineering and legacy that require close to an entire lifetime to be mastered entirely.

C# (or Java) are "accessible", meaning a newbie devlopper can produce something halfway baked in these languages relatively quickly.

And this is something you can not say about C++.

But they are not in any way "simple".

I don't think you're talking about same thing

Just because JVM or CLR are complex, then it *doesn't* mean that writting good C# / Java requires you to be proficient at CLR/JVM lvl and because of that it is hard.

>meaning a newbie devlopper can produce something halfway baked in these languages relatively quickly.

Newbie developer can produce mediocre solutions in all of those - C#, Java, C++.

The difference is that in C#/Java world it may be slow and in C++/C world it may be exploitable (more likely) <snark>.

Anyway, in my world very often it's not about internals, but about modeling skills, about OOP, testability. Those are some of the ways of measuring how good the code is.

Good system modeling skills are way above technology

How exactly are they not simple? Well, not C# because it has a problem with a bit of a feature creep similar to C++, but Java is a really tiny language compared to.. anything.

And you don’t have to be a master of the JVM because chances are you are not a gcc/clang maintainer and yet you can write performant-enough correct code.

N ways to do something, but in exchange you can get good solutions in C++. In the C# world you are locked to a medicore compiler, with a medicore package manager, a sub standard (and complicated!) build system and a unacceptable code formatter, for example.
>with a medicore package manager

What do you mean? since .NET Core it always worked flawlessly for me

>unacceptable code formatter

hmm? that's preference not an argument.

"everything else you list exist in the C++ ecosystem with generally better alternative than in C#."

That seems a little questionable. Maybe "sometimes better"?.

The other people you work with may not, #1 reason to choose a language despite my own knowledge of C++ :D
Memory safety? Garbage collection? Library ecosystem?
RAII is a kind of garbage collection, too.

Longer-lived objects still need (trickier) manual deallocation.

> RAII is a kind of garbage collection, too.

No it isn't. With RAII, you can look where an object gets constructed and know exactly where it will be destructed. With garbage collection, you can't, and in fact there's no guarantee that it ever will be. Also, with garbage collection, you can save references to whatever you like, wherever you like, for as long as you like. With RAII, you need to make sure you don't create any dangling references or use any dangling pointers.

No, with RAII you still need to design your program around who owns each object, and thus who should clean it up. You end up with borrowing, move semantics and others. With (Tracing/Copying) Garbage Collection, none of this exists.

Not to mention, Copying GC also solves memory fragmentation, which C++ still suffers from unless you also design your allocations carefully around sizes of types.

> No, with RAII you still need to design your program around who owns each object, and thus who should clean it up

With or without RAII you should design your program around who owns each object, unless you want to end up with unmaintainable mess leaking file descriptors, network sockets, native memory buffers or trying to access resources after closing them. Which is why Cassandra and Netty implement their own reference counting.

> Not to mention, Copying GC also solves memory fragmentation

Not really. It only moves the problem elsewhere so it doesn't look like fragmentation. Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work. Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction. As soon as your program logically frees a memory region (by dropping a path to it), you have temporary fragmentation until the next GC cycle, because that region is not available for allocation immediately. And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.

> which C++ still suffers from unless you also design your allocations carefully around sizes of types

Modern allocators split allocations into size buckets automatically.

> Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work.

Not in the case of a mark-compact collector, which works entirely in place, or a mark-region collector such as Immix [0], which only copies a small fraction of the heap.

> Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction.

An improvement would be to to perform more frequent "partial" collections, such as in the Train algorithm [1]. But some collectors (such as Immix again) avoid compaction until fragmentation is considered bad enough, which seems like a fair compromise.

> And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.

The description of Doug Lea's allocator [2] suggests there are also "object headers" of a sort on allocated data in dlmalloc. You could probably steal mark bits from those headers, but it is commmon to use a separate marking bit/bytemap which is separate to space where objects are allocated, and thus has none of the fragmentation you describe.

[0] https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix...

[1] https://beta.cs.au.dk/Papers/Train/train.html

[2] http://gee.cs.oswego.edu/dl/html/malloc.html

Fortunately, with GC, you can avoid thinking about many small objects you constantly allocate along the way. Most of them will get collected the next GC run as a young generation going out of function / block scope. Some of them will travel down the call graph and may end up long-living, then eventually collected.

But I agree: for anything that you want to deallocate deterministically, or at least soon enough, you need to track ownership, and care about the lifetimes. Such objects are relatively few, though.

I meant tracing garbage collection. I'd say that something like 95% of allocations in real-world code can be done straightforwardly with RAII, or could be if the language supported it (and indeed gain maintainability benefits from being forced into an RAII-centric paradigm). But the remaining 5% is a real pain, and distributed over a wide variety of problems in a wide variety of domains. So tracing GC really does make life a lot easier, if you can afford it.
The freedom to reference anything easily from any place is a double edge sword. I agree it makes 5% of hard issues go away, but on the flip side it makes the other 95% more complex. Tracing GC is a "goto" of memory management. You may argue goto is a good thing because it offers you freedom to jump from anywhere to anywhere and you're not tied to constraints enforced by loops and functions. We all know this is not the case. Similarly being able to make a reference from anywhere to anywhere leads to programs that are hard to reason about. We should optimize for readability not the ease of writing.
There is no reason why you could not, in principle, have Rust-style compile-time borrow checking in a managed language.

As an extreme example (that I have occasionally thought about doing though probably won't), you could fork TypeScript and add ownership and lifetime and inherited-mutability annotations to it, and have the compiler enforce single-ownership and shared-xor-mutable except in code that has specifically opted out of this. As with existing features of TypeScript's type system, this wouldn't affect the emitted code at all—heap allocations would still be freed nondeterministically by the tracing GC at runtime, not necessarily at the particular point in the program where they stop being used—but you'd get the maintainability benefits of not allowing unrestricted aliasing.

(Since you wouldn't have destructors, you might need to use linear instead of affine types, to ensure that programmers can't forget to call a resource object's cleanup method when they're done with it. Alternatively, you could require https://github.com/tc39/proposal-explicit-resource-managemen... to be used, once that gets added to JavaScript.)

Of course, if you design a runtime specifically to be targeted by such a language, more becomes possible. See https://without.boats/blog/revisiting-a-smaller-rust/ for one sketch of what this might look like.