Hacker News new | ask | show | jobs
by winwang 455 days ago
Are there benchmarks you can share? Not discounting Rust, just wondering if you're already seeing some obvious numbers.
2 comments

Sorry, not yet, but this is the direction ArkFlow is working hard. Rust's own potential will also guide ArkFlow in this direction.
Rust is rather heavy on its copy/clone imposed semantics making it potentially less suitable for low-latency or large data volume processing workloads. Picking Rust for its performance potential only means that you're going to have a harder time beating other native performance-oriented stream processing engines written in either C or C++, if that is your goal of course.

This logic

> written in rust will have better performance, lower latency, ..., lower memory footprint

is flawed and is cargo-cult programming unless you say what are you objectively comparing it against and how you intend to achieve those goals. Picking the rightâ„¢ language just for the sake of these goals won't get you too far.

> Rust is rather heavy on its copy/clone imposed semantics making it potentially less suitable for low-latency or large data volume processing workloads. Picking Rust for its performance potential only means that you're going to have a harder time beating other native performance-oriented stream processing engines written in either C or C++, if that is your goal of course.

There is absolutely nothing in Rust's semantics preventing you from writing high-performance data processing workloads in it, and in fact it's one of the best languages for that purpose. Beyond that, the usual barrier to entry for working on a product like this written in C++ is incredibly high in part because stability and safety are so critical for these products--which is one of the reasons that in practice they are often written in memory safe languages, where C++ is not even an option. Have you worked on any nontrivial Rust data processing product where "copy/clone imposed semantics" somehow prevented you from getting big performance wins? I'd be very curious to hear about this if so.

Stability and safety are the least of the concerns in data processing and database workloads. That's totally not the reason why we saw an increase of these systems during the 90s and early 00s written in Java or similar alternative languages. It was ease of use, low-entry bar into the ecosystem and generally developer pool accessibility. Otherwise, the cost is the main driver in infrastructure type of software and the reason why we see many of these rewritten exactly in C++. Rust is just another contender here, and it's usually because of the performance and a lot of hype recently, which is fair.
> Stability and safety are the least of the concerns in data processing and database workloads. That's totally not the reason why we saw an increase of these systems during the 90s and early 00s written in Java or similar alternative languages.

not_sure_if_serious.jpg

To be extra clear about it (and to avoid pure snark, that's frowned upon here at HN): that's the kind of software (alongside a lot of general enterprise code) that got rewritten from C++ to Java, not the other way around. The increased safety of Java was absolutely a consideration. Java was the 'Rust' of the mid-to-late 1990s and 2000s, only a whole lot slower and clunkier than the actual Rust of today.

I am serious. C is a simple language but rather complicated to wrap your head around it since it requires the familiarity with low-level machine concepts. C++ ditto but with a difference that it is a rather complicated language with rather advanced programming language concepts - something that did not really exist at that time. So the net result was a very high entry barrier and this was the main reason, and not "safety" as you say, why many people were running away from C and C++ to Java/C# because those were the only alternatives we had at that time. I don't remember "safety" being mentioned at all during the past 20 years or so up until Rust came out. "Segfaults" were the 90s and 00s "safety" vocabulary but, as I said, it was a skill issue.

Frenzy around the "safety" IMO is way too overhyped and when you and OP say that "safety" plays a huge role in data processing and database kernel source development, no - it is literally not even a 1% of time that a developer in that domain spends his time on. C and C++ are still used in those domains full on.

> that's the kind of software (alongside a lot of general enterprise code) that got rewritten from C++ to Java, not the other way around

Which C or C++ engines exactly got rewritten to Java? We can start from this list: https://db-engines.com/en/ranking

> Stability and safety are the least of the concerns in data processing and database workloads.

I'm curious how you came to this conclusion?

I have professional working experience in this domain.
Why do you have to beat a native performance-oriented streaming engine written in C or C++?

Currently, most of the mainstream stream processing engines are written in Java. Sorry, I may not add qualifiers to make you misunderstandings.

Software does not have silver bullets, so does programming languages, and each has its own strengths. I also like to use go and Java to develop software.

So if you don't want to beat native engines in performance what is it that you're trying to solve but Java-based engines don't have? I think it's pretty important to set a vision upfront otherwise you're going to set yourself a trap for a quick failure.
Hi! Brother, I think I will seriously consider what you said and I am honored to communicate with you.
Welcome to follow the latest news from ArkFlow at any time and even participate.