Thanks for that. Now that I've had a chance to read through it, a question:
The examples seem to be implemented in pure Rust. No one is going to port their Spark jobs to Rust in the shot term. Have you evaluated perf with Python etc?
If you're still seeing significant speedups, you might want to bottle this up and seek VC because a managed service along the lines of 'databricks but 10x faster' would certainly get traction.
It is in a very initial POC stage and distributed mode is pretty basic, but it is moving faster than I expected. Python integration is definitely one of the primary objectives as I suspect that no one is going to learn Rust for this, although I feel that it is not that hard. In fact, it can have a better integration story with python than Spark as Rust has good C interop. Regarding performance, yeah it is pretty good from what I have seen for CPU intensive tasks and once blockmanager is implemented with compression and other optimizations like Spark, shuffle tasks also will improve. There are a lot of unnecessary allocations here than I would prefer just to keep it in safe Rust as much as possible and there is still plenty of optimizations possible here. I am doing this in my free time only. I feel that it is too early to compare witn Spark given how many features Spark has. Maybe in a couple of months after it matures a bit and if there is enough traction for this, then we can look for sponsors.
The examples seem to be implemented in pure Rust. No one is going to port their Spark jobs to Rust in the shot term. Have you evaluated perf with Python etc?
If you're still seeing significant speedups, you might want to bottle this up and seek VC because a managed service along the lines of 'databricks but 10x faster' would certainly get traction.