Hacker News new | ask | show | jobs
by sally1620 1635 days ago
It is quite interesting that most of the problems mentioned don't exist in recent version of C# on .NET Core, considering all the similarities of C# and Java.

I would even say, some of the problems didn't exist in C# in 2009. C# always had value types with configurable in memory layout. It also has a very good mmap solution. It also allows for hand optimize things using unsafe blocks.

2 comments

> C# always had value types with configurable in memory layout. It also has a very good mmap solution. It also allows for hand optimize things using unsafe blocks.

And C has inline assembly. Doesn't mean that most C code will use inline assembly.

Back in 2009, a lot of git utilities were still written in scripting languages. Not sure when it started, but the porting activity of those utilities to C is still ongoing. So the maintainers still want to use a lower level language today.

In other projects in the VCS space, we are seeing a similar trend. Hg, originally a project written in Python, is being rewritten in Rust by Facebook, one of the big users of it.

Sure, maybe you could have used C# together with some niche features. But it's not going to be fun compared to a language that has zero cost abstractions and that runs on the bare metal.

Even if your problem domain demands a managed environment, like extensibility with plugins, I still suggest you to use Rust together with wasm. It's the first choice thanks to its great type system, powerful static analyzer and first class support for resource management that garbage collected languages lack.

Is there a term for this phenomena yet?

"if another language is being discussed, Rust must be forced into the discussion, no matter how tenuous the connection"

I think in this scenario it's totally germane to mention Rust because the problem described in the linked post is exactly the problem that Rust was designed to solve: providing sufficiently precise control over low-level runtime behavior that you never hit a "sorry, it's not possible to do that optimization in this language" situation, while still (arguably? hopefully?) qualifying as a "higher-level language" in the relevant sense. In particular, every problem with Java that the post describes has a straightforward solution in Rust, and this kind of thing is why Rust exists instead of, e.g., Mozilla just rewriting Firefox in an existing managed language with a garbage collector.

That being said, GP seems to imply that Rust should be the default choice for basically every problem, which goes way too far. Not every application needs this kind of low-level control. Maybe even most don't (although I look forward to a future where it's easy to drop into Rust from a managed language when you hit a performance wall; I think this has been mostly achieved for Python, but not yet for other languages). But some do, and it sure sounds like Git's one of them.

Rust is a low level language no matter how productive it may be.

The memory layout will simply leak into the program architecture and will have to be altered on refactors — something which is transparent with managed languages.

What do you mean here by memory layout? For instance, the order of fields in a rust struct can (theoretically) change by recompiling. It's not defined by the order of fields in the definition.
On a language level, high level APIs will necessarily contain details to things like (mut) reference, Box whatever. Which is not a problem at all, given the problem domain, but in my opinion it is not possible to make a both low and high level language at the same time (and it is not really needed either)
Unless you add a repr(C) attribute for C interop.
Git is the subject of the linked e-mail. Mercurial is the big contender to git that is not written in a C language. Their response to Hg's performance issues was not to use or create some Python feature that allows them to speed up some fast paths, but to use a proper low level language in the first place, which happens to be Rust. I'm not sure you can get more relevant to the discussion than this.

The trend seems to go away from high level languages in the VCS space. Developer time is one of the most expensive resources that FANG pays for, so any kind investment in performance improvements is going to pay off quite well.

Is there a term for this phenomena yet? "if another language is being discussed, Rust must be forced into the discussion, no matter how tenuous the connection"

Rustrusion

This always happens with whatever language is in vogue at the time. Now it’s Rust. It used to be Go (which still has a little juice left). Before that, Closure and Haskell both had runs. And before that… hell, I remember when Java was talked about this way.

This is the natural order of things and is good.

And the proper term for introducing Rust should be “oxidation”.

elixir, RoR and nodeJS, (and Python a couple of times) spring to mind. Some of those languages have found a niche. But lot of new languages made older languages nicer by adopting language/framework features
Arguably, carcinization [0].

[0] https://en.wikipedia.org/wiki/Carcinisation

I’m not a…rustafarian?…but we didn’t get as cross when C# was mentioned above, in a thread about Java and C. In fact it’s top comment at my time of reading.
Well, I would prefer that people would discuss alternate systems programming languages when "C is fast" comes up.

We could use some perspective from, say, Ada programmers. Unfortunately, none of them ever seem to show up.

> say, Ada programmers.

I stand summoned.

> Unfortunately, none of them ever seem to show up.

We do from time to time, but people assume our language is dead (it isn't). I learned it last year and I've been very impressed by how simple it is, given the speed you get with it.

It was a "big language" at the time, but now it's a language smaller than Rust or C++ which offers good performance with straightforward syntax. Ada also has a package manager now which includes toolchain install.

Ada has inline assembly, easy usage of compiler intrinsics, dead-simple binding to C, built-in multi-tasking (which includes CPU pinning), a good standard library, RAII, and real honest-to-goodness built-in, not-null-terminated strings. It's a compiled language, so you get good speed in general, but the built-in concurrency really does help work which can be split up. Ada 202x is getting even finer grained parallelism (parallel for-loops) in the language itself to even further help this.

- https://alire.ada.dev/

- https://learn.adacore.com/

- https://github.com/pyjarrett/programming-with-ada

- https://en.wikibooks.org/wiki/Ada_Programming

> but people assume our language is dead

And/or a lot of misconceptions. I showed up many times as well with those links, and explanations and whatnot.

I recommend https://blog.adacore.com/, too. Ada/SPARK is great when you want formal verification, and your checks to be done by GNATprove; statically, instead of dynamically. FWIW, you can disable runtime checks in Ada.

I also commented https://docs.adacore.com/live/wave/spark2014/html/spark2014_... not too long ago. The whole documentation is useful anyway. You can prove the absence of memory leaks, among a lot of other stuff!

> And/or a lot of misconceptions.

I've tried too. I have an article about some of these:

- https://pyjarrett.github.io/programming-with-ada/clearing-th...

I've heard all sorts of things about ADA. my the main thing keeping me fron delving in has been the lack of general info about it. Thankyou for the links! I'll be taking a look through these. What kinds of projects are people building in ADA these days? I'm interested in it primarily for robotics.
I use Ada as my alternative to C, when I don't feel like doing C++.

I've written a few tools for myself, including a command line code discover tool for large code bases (tens of millions of lines). There's a bunch of embedded work being done with it.

Make sure you use "Ada" rather than "ADA". Some people might give you trouble about it--it's not an acronym, just a name :)

Ada is a bit verbose for my tastes. Nim [1] is fast like C - I have yet to really find anything rewritten in Nim be slower. It's safe-ish like Rust { there is an easily identifiable subset of unsafe constructs }. It's kind of like Ada, but with Lisp-like syntax macros/meta programming and Python-like block indentation (Lisp folks always said they "read by indentation" anyway). Nim also has user definable operators and many other features. Compile times are very small while the stdlib is big-ish.

Small sample statistics, but three or four times now I have re-written Rust in Nim and the Nim ran faster. Once you can do inline assembly/intrinsics in a PL, most "real world" benchmarks reduce to a measure of dev patience/time/energy not the language. They also become "multi-language" solutions (if you count SIMD asm as a language which I think one should). Even slow Python allows C/Cython modules which in the real world are absolutely fair game, and you can call SIMD intrinsics from Cython pretty easily, too. Since we have few ways to quantify dev patience/attention objectively, these "my PL is faster than yours" discussions are usually pretty pointless.

[1] https://nim-lang.org/

They don't show up because their not out evangelizing every oppurtunity they get.
And perhaps that's why other languages are more popular?

Akin's Laws of Spacecraft Design are appropriate here:

> 20. A bad design with a good presentation is doomed eventually. A good design with a bad presentation is doomed immediately.

The old term for .NET/Java was "Managed" languages. "Managed C++", "C# is a managed language", because they all manage your memory for you.

Rust's primary language feature - the borrow checker - is about adding compile-time checks on resource management(mainly memory), and the original article talks about boxed vs. value types being a major source of inefficiency.

So talking about Rust in a comparison of C and Java mentioning memory indirection bottlenecks seems about the most relevant place to discuss it.

Most people talking about C# and Java, they refer mostly to application development. You rarely hear these languages at system programming (doable, just rare). Rust is at C/C++ level when it comes to system programming and eliminates a lot of C/C++ issues and yet added features found in Java and C#, and even Haskell. People just don't know a lot about Rust to criticize upon and yet seeing it mentioned everywhere. I can understand if some feel a bit "fed-up" seeing Rust brought up in a non-Rust thread. But I do agree with you, Rust is very relevant for discussion here.
The article title is "Why is C Faster than Java".

I would expect to see Java, C#, C, C++, and Rust mentioned quite a bit in the threads here. It's all relevant.

Based on the article, the title should be, "Why is C Git faster than JGit." It's literally nothing but that.
I believe not mentioning Rust whenever possible is strictly forbidden as "mean behavior" in the Rust Code of Conduct.
It's actually the opposite. If anything being evangelical about Rust is heavily discouraged

The truth is Rust is an amazing language, with its own warts (async, Pin, etc.), but there is pent up demand for language that fits its description. Non-manual, non-GC low level oriented language. It's not a wonder some projects are switching to Rust

What exactly is switching to Rust?
Hg, in context of this discussion, but even Dropbox moved some of its software to Rust.
Hype
Can hardly blame people for talking about modern languages in a discussion about obsolete ones.
> Can hardly blame people for talking about modern languages in a discussion about obsolete ones.

The point is that the issue does not involve people discussing "modern languages", just mindlessly shoehorning references to Rust into any discussion involving any application of a language which is not Rust.

I get Rust fanboys are excited about their hobby, but this sort of obsessive "when the only tool you have is a hammer" discussion is very tiring and fruitless, and only conveys a poor image of Rust's community.

So, let me get this straight: We have a thread about a programming language (Java), then it gets compared to another programming language (C#), then it gets compared to a third one (C) and no one bats an eye. But when Rust is mentioned it's because of "fanboys". Yeah, sure.
> So, let me get this straight: We have a thread about a programming language (Java) (...)

No, you really don't. If you read the thread you're commenting on, you'll notice it's about C#.

The very first comment of the thread you're discussing in, and also the top post of this discussion, is, and I quote:

> It is quite interesting that most of the problems mentioned don't exist in recent version of C# on .NET Core, considering all the similarities of C# and Java. (...)

And somehow Rust fanboys parachute into the discussion to yet again talk about their hammer handling all nails and nail-like problems.

> But it's not going to be fun compared to a language that has zero cost abstractions

C# has them. For instance, interfaces used as generic type constraints are zero cost.

Another thing, some C# abstractions are very low cost. Critically to this thread, Span<T> abstraction is low cost, pretty much the same thing as a pointer+length in C. It's easy to design an abstraction which uses spans of bytes backed by a memory-mapped file, and the performance going to be pretty similar to C.

> C# has them. For instance, interfaces used as generic type constraints are zero cost.

Depends on what we mean by 'zero cost'. For instance, Interface constraints themselves may not have a 'cost', but there are many cases where this means that the calls involving that generic type will be virtual (unless you're doing fun patterns like 'where TComparer : IEqualityComparer<T>,struct`). If you poke around at the internals of System.Linq you'll see there's a lot of checking to use specialized types depending on the collection in order to minimize costs.

And that's what you'll see a lot of in the .NET Standard bits; even in the past we've had some fairly low cost abstractions in places. SocketAsyncEventArgs, if a little arcane at first is a good design for it's time, and System.Linq.Expressions has been a great way for users to minimize the cost of things like reflection without having to write bytecode.

That said, some abstractions are deceptively costly; the 'new' generic constraint is definitely not zero cost, unless that got fixed in 6.0.

> unless you're doing fun patterns like 'where TComparer : IEqualityComparer<T>,struct`

These fun patterns are precisely generic type constraints I mentioned in my comment. I do use them when performance matters, here’s an open-source example: https://github.com/Const-me/Vrmac/blob/1.2/Vrmac/Draw/Main/I... That code is from a 2D vector graphics library, these interface methods may be called at 10 kHz frequency or more. Displays are often 60 Hz, the methods are called couple times for every vector path being rendered.

> If you poke around at the internals of System.Linq you'll see there's a lot of checking to use specialized types depending on the collection in order to minimize costs.

Linq is awesome, but I’m pretty sure it was designed for usability first, performance second. I tend to avoid Linq (and dynamic memory allocations in general; delegates are using the heap) on performance-critical paths. YMMV but in most of the code I write, these performance-critical paths are taking way under 50% of my code bases.

> 'new' generic constraint is definitely not zero cost

If you mean the overhead of Activator.CreateInstance<T> when generic code calls new() with the generic type, I’m not 100% certain but I think it’s fixed now. According to https://source.dot.net/, that standard library method is marked with [Intrinsic] attribute, the runtime and JIT probably have optimizations for value types.

C doesn't have inline Assembly, it is a common language extension.

An ISO C certified compiler isn't required to support it.

You should read ISO/IEC 8859:2011 J.5.10 "The asm keyword". It's the same section in the C18 standard. It's the bit describing the way an ISO C certified compiler shall provide inline assembly.
I am fully aware of it, it clearly specifies that it is implementation specific.

Two C certified compilers for the same platform are free to provide completely different behaviours for what asm is supposed to do.

Anyone that cares about compilers does actually read ISO documents.

The comparison is against Java because it has certain feature parity with C#. And it is right, C# code can be brought closer to C level of performance with less effort than in Java.
I watched a conference presentation by Scylla DB and a lot of the reasons given for their perf boost using C++ over Cassandra's Java seem like C# might address now in 2021. Span<T> in particular is a perf game changer for this kind of stuff.

Would be interesting if C# now would be a viable alternative to C++ for them.

Hey. ScyllaDB employee here. There are several reasons C++ was used and I don't think span ultimately matters. List from top of my head (ordered randomly):

1) we use intrusive containers, so memory managing container data structures is collocated with actual data. 2) memory allocation is not tied to GC, so we don't get pauses 3) there's almost none synchronization between different threads and there are (almost) no globals. For a story why globals are a killer for performance, read https://www.p99conf.io/2021/09/28/hunting-a-numa-performance... 4) the previous is only possible with existence of a user space scheduler which guarantees that specific threads are pinned to a single CPU. Also, there's no need to call mmap multiple times, as Seastar (concurrency framework written with Scylla in mind) allocates whole system memory and takes advantage of overcommitting in Linux. There's no syscall at memory allocation, just some userspace work and a possible page fault.

I'm not sure whether C# can do away with these problems? Let me know if you know. That being said, modern C++ is really convenient. Not anything you've seen 15 years ago in university.

I can ensure you that what you would see in a university in 2022 is going to be just like 15 years ago.

As for C# you can do most C++ like stuff in C# 10.

As a matter of fact, I'm still at university and mine actually showed a fair bit of modern C++ (University of Warsaw here). So I don't feel ensured. As for C#, I don't claim I know stuff. I'd just like to learn something new. If you finish at an assertion like yours, sadly I don't learn anything new.
As proven by occasional threads on /r/cpp and including complaints from Bjarne himself on some of his talks, that is unfortunately not yet a common practice.

Regarding C#, if you really want to learn how to do C++ style programming in C#, have a look at the documentation regarding C# 7.0 - 7.3, C# 8, C# 9 and C# 10 regarding readonly structs, span, stackalloc in safe code, blittable types, GC free regions, malloc/free calls, allocation free memory pipelines, in and return ref types, local references, using pattern (implementing IDispose is no longer required)

Regarding classical C# (what is available until .NET Framework 4.8), you have structs, value types, manual memory management via System.Runtime.InteropServices.

You can start at the free posters here, https://prodotnetmemory.com/

> actually showed a fair bit of modern C++ (University of Warsaw here)

You mean like C++11? So C++ standard from 11 years ago ? or C++14? C++17? Last time I checked UW was like 17 years ago so maybe things changed but back then they were like 10+ years behind industry in practical terms.

In 2019 it was in C++17. Now it's in C++20.
Don't know how .Net, but many times I found that Java HotSpot compiler is weaker in optimisation strength than C, C++ and Rust compilers.

For instance see this: https://pkolaczk.github.io/overhead-of-optional/

As for databases, C++ has not only a performance edge over Java (and possibly .NET) but it also offers superior non-memory resource management capabilities. Databases manage a lot of resources that are not memory, and RAII is a game changer.

Your linked post is not really a good example for that — escape analysis is very finicky without language-level semantic guarantees the compiler could use. With the proposed Valhalla changes Optional will be a value-class and these optimizations become trivial.

Especially when you return a value, it is more than likely to escape.

Optional is only a part of the picture here. It also missed:

* branch elimination with cmov

* loop unrolling

* SIMD vectorization

* turning heap allocation into stack allocation

All those things could be done without breaking any semantic guarantees of Optional even without value types in place.

Also note how even forcing the Rust program to use references with double Box didn't make the code any worse. So Rust/LLVM had no issue optimizing that out even if Option was defined the way it is in Java now.

A lot of the problem stems from Java’s boxing, because the first n values are cached and so defeat escape analysis can’t remove the boxing reliably, and that cannot be fixed without breaking some applications.
Java is capable of all of these optimizations though — but I am not an OpenJDK dev so I’m getting out of my depth here.

Of course you have less time/resources during JIT compilation (and mostly, inline depth), so the quality of the resulting code can at times be vastly worse than what an AOT compiler can do, but my experience is that in real life code bases Java’s JIT compiler is really great, while this benchmark reflects on a singular case where it failed.

> Java is capable of all of these optimizations though

In theory - yes.

In my experience it just repeatedly does worse job than a C / C++ / Rust compiler, unless I'm very careful in Java coding (yes, I can often make it close, but this requires way non-idiomatic Java code; e.g. I've seen cases when manually unrolling a loop helped getting 2x more performance, which is something I don't recall ever having to do in C / C++ / Rust).

For example we don't use Java Streams in performance critical code, because everybody on the team knows it does not optimize them back to the level of simple for loops. Well, we checked many times and it simply never happened, although, theoretically it could. But I can throw a chain of map/filter/fold calls in C++ or Rust freely and it just works as fast as a hand-optimized loop, with unrolling, simd, etc.

There's no reason to program in C# if you already know C++.
Other than memory safety, simplicity, dependency management and build tooling, the .NET standard libraries, the open source library ecosystem, and so on...
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
> Other than memory safety, simplicity, dependency management and build tooling, the .NET standard libraries, the open source library ecosystem

Not feeding the troll but outside memory safety, everything else you list exist in the C++ ecosystem with generally better alternative than in C#.

And as soon as you do touch mmap , unsafe area or native code in C# you loose memory safety too anyway.

> Not feeding the troll but outside memory safety, everything else you list exist in the C++ ecosystem with generally better alternative than in C#.

Oh? I don't think you can dispute that the C standard library is very limited and the state of dependency management / build tooling is very poor. And that actually limits the usability of the open source library ecosystem quite a lot; maybe there are more C++ libraries out there, but you can't just type what you want into the nuget search bar and get on with using it.

Simplicity is in the eye of the beholder, but the very weak semantics of C++ templates mean you can't reason compositionally about C++ code, whereas in C# it's relatively easy to have a codebase that you can reliably understand piecemeal.

> And as soon as you do touch mmap , unsafe area or native code in C# you loose memory safety too anyway.

In principle yes, but if you keep those points very rare then you can subject them to extra review etc. at a level that would be impractical with a C++ codebase (where even "a + b" is undefined behaviour in the general case). Memory safety vulnerabilities in real-world C# codebases are rare.

>simplicity

I didn't spent a lot of hours in C++ world, but it never felt simple

- N compilers, N package managers, N ways to do everything

> I didn't spent a lot of hours in C++ world, but it never felt simple

C++ is not simple.

But presenting C# (or Java) as "simple" is equally hypocritical. The JVM or the CLR and their associated frameworks are monster of complexity, engineering and legacy that require close to an entire lifetime to be mastered entirely.

C# (or Java) are "accessible", meaning a newbie devlopper can produce something halfway baked in these languages relatively quickly.

And this is something you can not say about C++.

But they are not in any way "simple".

N ways to do something, but in exchange you can get good solutions in C++. In the C# world you are locked to a medicore compiler, with a medicore package manager, a sub standard (and complicated!) build system and a unacceptable code formatter, for example.
"everything else you list exist in the C++ ecosystem with generally better alternative than in C#."

That seems a little questionable. Maybe "sometimes better"?.

The other people you work with may not, #1 reason to choose a language despite my own knowledge of C++ :D
Memory safety? Garbage collection? Library ecosystem?
RAII is a kind of garbage collection, too.

Longer-lived objects still need (trickier) manual deallocation.

> RAII is a kind of garbage collection, too.

No it isn't. With RAII, you can look where an object gets constructed and know exactly where it will be destructed. With garbage collection, you can't, and in fact there's no guarantee that it ever will be. Also, with garbage collection, you can save references to whatever you like, wherever you like, for as long as you like. With RAII, you need to make sure you don't create any dangling references or use any dangling pointers.

No, with RAII you still need to design your program around who owns each object, and thus who should clean it up. You end up with borrowing, move semantics and others. With (Tracing/Copying) Garbage Collection, none of this exists.

Not to mention, Copying GC also solves memory fragmentation, which C++ still suffers from unless you also design your allocations carefully around sizes of types.

> No, with RAII you still need to design your program around who owns each object, and thus who should clean it up

With or without RAII you should design your program around who owns each object, unless you want to end up with unmaintainable mess leaking file descriptors, network sockets, native memory buffers or trying to access resources after closing them. Which is why Cassandra and Netty implement their own reference counting.

> Not to mention, Copying GC also solves memory fragmentation

Not really. It only moves the problem elsewhere so it doesn't look like fragmentation. Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work. Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction. As soon as your program logically frees a memory region (by dropping a path to it), you have temporary fragmentation until the next GC cycle, because that region is not available for allocation immediately. And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.

> which C++ still suffers from unless you also design your allocations carefully around sizes of types

Modern allocators split allocations into size buckets automatically.

I meant tracing garbage collection. I'd say that something like 95% of allocations in real-world code can be done straightforwardly with RAII, or could be if the language supported it (and indeed gain maintainability benefits from being forced into an RAII-centric paradigm). But the remaining 5% is a real pain, and distributed over a wide variety of problems in a wide variety of domains. So tracing GC really does make life a lot easier, if you can afford it.
The freedom to reference anything easily from any place is a double edge sword. I agree it makes 5% of hard issues go away, but on the flip side it makes the other 95% more complex. Tracing GC is a "goto" of memory management. You may argue goto is a good thing because it offers you freedom to jump from anywhere to anywhere and you're not tied to constraints enforced by loops and functions. We all know this is not the case. Similarly being able to make a reference from anywhere to anywhere leads to programs that are hard to reason about. We should optimize for readability not the ease of writing.