I have the same thoughts. However my impression is also that most orgs would choose eg databricks or something for the permission handling, web ui, ++ so what is the equivalent «full rig» with duckdb and S3 / blob storage?
Yeah I think that's fair, especially from the 'end consumer of the data' point of view, and doing things like row-level permissions.
For the ETL side, where often whole-table access is good enough, I find Spark in particular very cumbersome - there's more than can go wrong vs. DuckDB and it's harder to troubleshoot.
For the ETL side, where often whole-table access is good enough, I find Spark in particular very cumbersome - there's more than can go wrong vs. DuckDB and it's harder to troubleshoot.