| I can appreciate the pain points you guys are addressing. The "diagonal scaling" approach seems particularly clever - dynamically choosing between horizontal and vertical scaling based on the query characteristics rather than forcing users into a one-size-fits-all model. Most real-world data workloads have mixed requirements, so this flexibility could be a major advantage. I'm curious how the new streaming engine with out-of-core processing will compare to Dask, which has been in this space for a while but hasn't quite achieved the adoption of pandas/PySpark despite its strengths. The unified API approach also tackles a real issue. The cognitive overhead of switching between pandas for local work and PySpark for distributed work is higher than most people acknowledge. Having a consistent mental model regardless of scale would be a productivity boost. Anyway, I would love to apply for the early access and try it out. I'd be particularly interested in seeing benchmark comparisons against Ray, Dask, and Spark for different workload profiles. Also curious about the pricing model and the cold start problem that plagues many distributed systems. |