Hacker News new | ask | show | jobs
by StreamBright 1261 days ago
You can try the examples or datafusion with flight. I have been able to process data with that setup in Rust under milliseconds that usually takes tens of seconds with a distributed query engine. I think Rust combined with Arrow, Flight, Parquet can be a game changer for analytics after a decade of Java with Hadoop & co.
1 comments

completely agree with this. Rust and arrow will be part of the next set of toolsets for data engineering. Spark is great and I use it every day but it's big and cumbersome to use. There are use-cases today that are being addressed by datafusion, duckdb, (to a certain extent, pandas).. that will continue to evolve.. hopefully ballista can mature to a point where it's a real spark alternative for distributed computations. Spark isn't standing still of course and we're already seeing a lot of different drop in C++ SQL engines.. but moving entirely away from the JVM would be a watershed, IMO