Hacker News new | ask | show | jobs
by esafak 455 days ago
Is there a product motivation; a deficiency you seek to rectify in existing solutions?
1 comments

I think a stream processing engine written in rust will have better performance, lower latency, more stable services, lower memory footprint, and cost savings. At the same time, ArkFlow is based on DataFusion implementation, which will put ArkFlow on a strong open source community.
Are there benchmarks you can share? Not discounting Rust, just wondering if you're already seeing some obvious numbers.
Sorry, not yet, but this is the direction ArkFlow is working hard. Rust's own potential will also guide ArkFlow in this direction.
Rust is rather heavy on its copy/clone imposed semantics making it potentially less suitable for low-latency or large data volume processing workloads. Picking Rust for its performance potential only means that you're going to have a harder time beating other native performance-oriented stream processing engines written in either C or C++, if that is your goal of course.

This logic

> written in rust will have better performance, lower latency, ..., lower memory footprint

is flawed and is cargo-cult programming unless you say what are you objectively comparing it against and how you intend to achieve those goals. Picking the rightâ„¢ language just for the sake of these goals won't get you too far.

> Rust is rather heavy on its copy/clone imposed semantics making it potentially less suitable for low-latency or large data volume processing workloads. Picking Rust for its performance potential only means that you're going to have a harder time beating other native performance-oriented stream processing engines written in either C or C++, if that is your goal of course.

There is absolutely nothing in Rust's semantics preventing you from writing high-performance data processing workloads in it, and in fact it's one of the best languages for that purpose. Beyond that, the usual barrier to entry for working on a product like this written in C++ is incredibly high in part because stability and safety are so critical for these products--which is one of the reasons that in practice they are often written in memory safe languages, where C++ is not even an option. Have you worked on any nontrivial Rust data processing product where "copy/clone imposed semantics" somehow prevented you from getting big performance wins? I'd be very curious to hear about this if so.

Stability and safety are the least of the concerns in data processing and database workloads. That's totally not the reason why we saw an increase of these systems during the 90s and early 00s written in Java or similar alternative languages. It was ease of use, low-entry bar into the ecosystem and generally developer pool accessibility. Otherwise, the cost is the main driver in infrastructure type of software and the reason why we see many of these rewritten exactly in C++. Rust is just another contender here, and it's usually because of the performance and a lot of hype recently, which is fair.
Why do you have to beat a native performance-oriented streaming engine written in C or C++?

Currently, most of the mainstream stream processing engines are written in Java. Sorry, I may not add qualifiers to make you misunderstandings.

Software does not have silver bullets, so does programming languages, and each has its own strengths. I also like to use go and Java to develop software.

So if you don't want to beat native engines in performance what is it that you're trying to solve but Java-based engines don't have? I think it's pretty important to set a vision upfront otherwise you're going to set yourself a trap for a quick failure.
Welcome to follow the latest news from ArkFlow at any time and even participate.